proDataMarket presentation at "Spatial Data on The Web"dapaasproject
Presentation at the "Spatial Data on The Web" event, 10th of February 2016, Amersfoort, The Netherlands
http://www.pilod.nl/wiki/Geodata_on_The_Web_Event_10_February_2016
proDataMarket presentation at "Spatial Data on The Web"dapaasproject
Presentation at the "Spatial Data on The Web" event, 10th of February 2016, Amersfoort, The Netherlands
http://www.pilod.nl/wiki/Geodata_on_The_Web_Event_10_February_2016
Continuous Intelligence: Keeping your AI Application in ProductionDr. Arif Wider
A talk by Arif Wider & Emily Gorcenski presented at NDC Porto '20
Abstract:
It is already challenging to transition a machine learning model or AI system from the research space to production, and maintaining that system alongside ever-changing data is an even greater challenge. In software engineering, Continuous Delivery practices have been developed to ensure that developers can adapt, maintain, and update software and systems cheaply and quickly, enabling release cycles on the scale of hours or days instead of weeks or months. Nevertheless, in the data science world Continuous Delivery is rarely been applied holistically.
This is partly due to different workflows: data scientists regularly work on whole sets of hypotheses, whereas software engineers work more linearly even when evaluating multiple implementation alternatives. Therefore, existing software engineering practices cannot be applied as-is to machine learning projects. Learn how we used our expertise in both fields to adapt practices and tools to allow for Continuous Intelligence–the practice of delivering AI applications continuously.
Industry@RuleML2015: Norwegian State of Estate A Reporting Service for the St...RuleML
Data distribution
•Public and private
•Data complexity
•Rich in attributes and location based
•Time dimension
•Example of data model from the Norwegian Mapping Authority
Denodo DataFest 2017: Business Needs for a Fast Data StrategyDenodo
Fast Data is an absolute business need – it requires the access to the most up-to-date data in real-time to perform its business function best and be ahead of the competition. At Altus Analytics, the faster we can turn around the data, the faster we can accelerate the generation of revenue. Watch live presentation here: https://goo.gl/ugkYTP
Watch this Denodo DataFest 2017 session to discover:
• Why fast data is a key business imperative.
• How to architect your modern data architecture to enable fast data.
• How Altus Analytics built a 3 tier modular architecture to tackle particular BI data modelling challenges.
Big Data Fabric for At-Scale Real-Time Analysis by Edwin RobbinsData Con LA
Abstract:- Companies are adopting big data for performing high-velocity real-time analytics on very large volumes of data to enable rapid analysis for business users using self-service and never-before-realized use cases. However, such projects have yielded limited value because these big data systems have become siloed from the rest of the enterprise systems holding critical business operational data. Big Data Fabric is a modern data architecture combining data virtualization, data prep, and lineage capabilities to seamlessly integrate at scale these huge, siloed volumes of structured and unstructured data with other enterprise data assets. This presentation will demonstrate with proven customer case studies in big data and IoT about the value of using big data fabric as a logical data lake for big data analytics.
In this presentation, you will find an explanation for ETL process which stands for Extraction, Transform and Load. This process is used to Extract data from different resources, transform them to a suitable form and load these data into a data warehouse. Then it will show some information about data junction tool which is used in the ETL process.
DKAN Drupal Distribution Presentation at Drupal Gov Days 2013Andrew Hoppin
Slides from presentation at Drupal Gov Days 2013 (http://drupalgovdays2013.org/content/dkan-drupal-distro-open-data) about the DKAN Open Data Distribution of Drupal
Increasing Agility Through Data VirtualizationDenodo
During the Data Summit Conference in New York, our CMO Ravi Shankar and BJ Fesq, Chief Data Officer at CIT Group, were discussing the modernization of data architectures with data virtualization.
This presentation explores how data virtualization is being used to dramatically reduce data proliferation and ensure that all consumers are working with a single source of the truth. It also looks at how data virtualization can drive standardization, measure and improve data quality, abstract data consumers from data providers, expose data lineage, enable cross-company data integration, and serve as a common provisioning point from which to access all authoritative sources of data.
Minimizing the Complexities of Machine Learning with Data VirtualizationDenodo
Watch full webinar here: https://buff.ly/309CZ1Y
Advanced data science techniques, like machine learning, have proven an extremely useful tool to derive valuable insights from existing data. Platforms like Spark, and complex libraries for R, Python and Scala put advanced techniques at the fingertips of the data scientists. However, these data scientists spent most of their time looking for the right data and massaging it into a usable format. Data virtualization offers a new alternative to address these issues in a more efficient and agile way.
Attend this webinar and learn:
*How data virtualization can accelerate data acquisition and massaging, providing the data scientist with a powerful tool to complement their practice
*How popular tools from the data science ecosystem: Spark, Python, Zeppelin, Jupyter, etc. integrate with Denodo
*How you can use the Denodo Platform with large data volumes in an efficient way
*About the success McCormick has had as a result of seasoning the Machine Learning and Blockchain Landscape with data virtualization
"FENIX platform OVERVIEW OF THE NEW SOFTWARE PLATFORM AND SYSTEM SETUP"FAO
"http://www.countrystat.org
What is FENIX?
A collection of software tools, methods and standards to facilitate acquisition, management and analysis of large, diversified and distributed sets of data."
Modernizing Data Architecture using Data Virtualization for Agile Data DeliveryDenodo
In this presentation, Dave Kay, Data Consultant within the Analytics and Architecture group at Zurich Insurance, explains how Zurich is modernizing their data infrastructure using data virtualization to accelerate delivery of mortgage insurance and intra-day operational reports to business analysts, salespeople, underwriters, managers, and actuarial staff.
This presentation is part of the Fast Data Strategy Conference, and you can watch the video here goo.gl/GLPPg2.
Data Virtualization: The Agile Delivery PlatformDenodo
Watch full webinar here: https://goo.gl/2wNBhg
To grow or compete in today's fast paced business environment, you need a robust, agile and cost effective data-driven decision strategy.
However, many companies are struggling with the growing complexity of data integration projects as they try to manage the increasing volumes and types of data from traditional enterprise sources as well as new sources such as big data, machine data, social media or cloud sources.
Data virtualization is the technology to simplify and reduce the costs of your data integration projects.
Watch this webinar in which we explore:
• How data virtualization lets you provide the business with the information it needs to make better decisions faster.
• How you can connect and combine all your data in real-time, without compromising on scalability, security or governance.
Denodo DataFest 2017: Conquering the Edge with Data VirtualizationDenodo
Watch the live session on-demand: https://goo.gl/qAL3Q7
No time like the present! That's one reason why edge analytics continues to grow in value and importance. With the right analytic architecture in place, companies can not only identify opportunities at the edge, they can take appropriate actions.
Watch this Denodo DataFest 2017 session to discover:
• The growing importance of edge computing in IoT
• How data virtualization plays a critical role in enabling edge analytics
• How Denodo’s innovative customers exploit edge for a winning business model
Ran van den Boom, 30 Novembre - 1 Dicembre 2021 -
Webinar: Sistemi moderni di integrazione dei dati: l’esperienza dell’Istat e di altri attori
Titolo: Data Virtualization at Statistics Netherlands
Continuous Intelligence: Keeping your AI Application in ProductionDr. Arif Wider
A talk by Arif Wider & Emily Gorcenski presented at NDC Porto '20
Abstract:
It is already challenging to transition a machine learning model or AI system from the research space to production, and maintaining that system alongside ever-changing data is an even greater challenge. In software engineering, Continuous Delivery practices have been developed to ensure that developers can adapt, maintain, and update software and systems cheaply and quickly, enabling release cycles on the scale of hours or days instead of weeks or months. Nevertheless, in the data science world Continuous Delivery is rarely been applied holistically.
This is partly due to different workflows: data scientists regularly work on whole sets of hypotheses, whereas software engineers work more linearly even when evaluating multiple implementation alternatives. Therefore, existing software engineering practices cannot be applied as-is to machine learning projects. Learn how we used our expertise in both fields to adapt practices and tools to allow for Continuous Intelligence–the practice of delivering AI applications continuously.
Industry@RuleML2015: Norwegian State of Estate A Reporting Service for the St...RuleML
Data distribution
•Public and private
•Data complexity
•Rich in attributes and location based
•Time dimension
•Example of data model from the Norwegian Mapping Authority
Denodo DataFest 2017: Business Needs for a Fast Data StrategyDenodo
Fast Data is an absolute business need – it requires the access to the most up-to-date data in real-time to perform its business function best and be ahead of the competition. At Altus Analytics, the faster we can turn around the data, the faster we can accelerate the generation of revenue. Watch live presentation here: https://goo.gl/ugkYTP
Watch this Denodo DataFest 2017 session to discover:
• Why fast data is a key business imperative.
• How to architect your modern data architecture to enable fast data.
• How Altus Analytics built a 3 tier modular architecture to tackle particular BI data modelling challenges.
Big Data Fabric for At-Scale Real-Time Analysis by Edwin RobbinsData Con LA
Abstract:- Companies are adopting big data for performing high-velocity real-time analytics on very large volumes of data to enable rapid analysis for business users using self-service and never-before-realized use cases. However, such projects have yielded limited value because these big data systems have become siloed from the rest of the enterprise systems holding critical business operational data. Big Data Fabric is a modern data architecture combining data virtualization, data prep, and lineage capabilities to seamlessly integrate at scale these huge, siloed volumes of structured and unstructured data with other enterprise data assets. This presentation will demonstrate with proven customer case studies in big data and IoT about the value of using big data fabric as a logical data lake for big data analytics.
In this presentation, you will find an explanation for ETL process which stands for Extraction, Transform and Load. This process is used to Extract data from different resources, transform them to a suitable form and load these data into a data warehouse. Then it will show some information about data junction tool which is used in the ETL process.
DKAN Drupal Distribution Presentation at Drupal Gov Days 2013Andrew Hoppin
Slides from presentation at Drupal Gov Days 2013 (http://drupalgovdays2013.org/content/dkan-drupal-distro-open-data) about the DKAN Open Data Distribution of Drupal
Increasing Agility Through Data VirtualizationDenodo
During the Data Summit Conference in New York, our CMO Ravi Shankar and BJ Fesq, Chief Data Officer at CIT Group, were discussing the modernization of data architectures with data virtualization.
This presentation explores how data virtualization is being used to dramatically reduce data proliferation and ensure that all consumers are working with a single source of the truth. It also looks at how data virtualization can drive standardization, measure and improve data quality, abstract data consumers from data providers, expose data lineage, enable cross-company data integration, and serve as a common provisioning point from which to access all authoritative sources of data.
Minimizing the Complexities of Machine Learning with Data VirtualizationDenodo
Watch full webinar here: https://buff.ly/309CZ1Y
Advanced data science techniques, like machine learning, have proven an extremely useful tool to derive valuable insights from existing data. Platforms like Spark, and complex libraries for R, Python and Scala put advanced techniques at the fingertips of the data scientists. However, these data scientists spent most of their time looking for the right data and massaging it into a usable format. Data virtualization offers a new alternative to address these issues in a more efficient and agile way.
Attend this webinar and learn:
*How data virtualization can accelerate data acquisition and massaging, providing the data scientist with a powerful tool to complement their practice
*How popular tools from the data science ecosystem: Spark, Python, Zeppelin, Jupyter, etc. integrate with Denodo
*How you can use the Denodo Platform with large data volumes in an efficient way
*About the success McCormick has had as a result of seasoning the Machine Learning and Blockchain Landscape with data virtualization
"FENIX platform OVERVIEW OF THE NEW SOFTWARE PLATFORM AND SYSTEM SETUP"FAO
"http://www.countrystat.org
What is FENIX?
A collection of software tools, methods and standards to facilitate acquisition, management and analysis of large, diversified and distributed sets of data."
Modernizing Data Architecture using Data Virtualization for Agile Data DeliveryDenodo
In this presentation, Dave Kay, Data Consultant within the Analytics and Architecture group at Zurich Insurance, explains how Zurich is modernizing their data infrastructure using data virtualization to accelerate delivery of mortgage insurance and intra-day operational reports to business analysts, salespeople, underwriters, managers, and actuarial staff.
This presentation is part of the Fast Data Strategy Conference, and you can watch the video here goo.gl/GLPPg2.
Data Virtualization: The Agile Delivery PlatformDenodo
Watch full webinar here: https://goo.gl/2wNBhg
To grow or compete in today's fast paced business environment, you need a robust, agile and cost effective data-driven decision strategy.
However, many companies are struggling with the growing complexity of data integration projects as they try to manage the increasing volumes and types of data from traditional enterprise sources as well as new sources such as big data, machine data, social media or cloud sources.
Data virtualization is the technology to simplify and reduce the costs of your data integration projects.
Watch this webinar in which we explore:
• How data virtualization lets you provide the business with the information it needs to make better decisions faster.
• How you can connect and combine all your data in real-time, without compromising on scalability, security or governance.
Denodo DataFest 2017: Conquering the Edge with Data VirtualizationDenodo
Watch the live session on-demand: https://goo.gl/qAL3Q7
No time like the present! That's one reason why edge analytics continues to grow in value and importance. With the right analytic architecture in place, companies can not only identify opportunities at the edge, they can take appropriate actions.
Watch this Denodo DataFest 2017 session to discover:
• The growing importance of edge computing in IoT
• How data virtualization plays a critical role in enabling edge analytics
• How Denodo’s innovative customers exploit edge for a winning business model
Ran van den Boom, 30 Novembre - 1 Dicembre 2021 -
Webinar: Sistemi moderni di integrazione dei dati: l’esperienza dell’Istat e di altri attori
Titolo: Data Virtualization at Statistics Netherlands
VTA presented this report about Envision Silicon Valley public input to Ad Hoc Committee on Envision Silicon Valley in February 2016. For more information about the program, visit http://www.vta.org/envision
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization Denodo
Watch here: https://bit.ly/2NGQD7R
In an era increasingly dominated by advancements in cloud computing, AI and advanced analytics it may come as a shock that many organizations still rely on data architectures built before the turn of the century. But that scenario is rapidly changing with the increasing adoption of real-time data virtualization - a paradigm shift in the approach that organizations take towards accessing, integrating, and provisioning data required to meet business goals.
As data analytics and data-driven intelligence takes centre stage in today’s digital economy, logical data integration across the widest variety of data sources, with proper security and governance structure in place has become mission-critical.
Attend this session to learn:
- Learn how you can meet cloud and data science challenges with data virtualization.
- Why data virtualization is increasingly finding enterprise-wide adoption
- Discover how customers are reducing costs and improving ROI with data virtualization
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)Denodo
Watch full webinar here: https://bit.ly/3dudL6u
It's not if you move to the cloud, but when. Most organisations are well underway with migrating applications and data to the cloud. In fact, most organisations - whether they realise it or not - have a multi-cloud strategy. Single, hybrid, or multi-cloud…the potential benefits are huge - flexibility, agility, cost savings, scaling on-demand, etc. However, the challenges can be just as large and daunting. A poorly managed migration to the cloud can leave users frustrated at their inability to get to the data that they need and IT scrambling to cobble together a solution.
In this session, we will look at the challenges facing data management teams as they migrate to cloud and multi-cloud architectures. We will show how the Denodo Platform can:
- Reduce the risk and minimise the disruption of migrating to the cloud.
- Make it easier and quicker for users to find the data that they need - wherever it is located.
- Provide a uniform security layer that spans hybrid and multi-cloud environments.
Unlock Your Data for ML & AI using Data VirtualizationDenodo
How Denodo Complement’s Logical Data Lake in Cloud
● Denodo does not substitute data warehouses, data lakes,
ETLs...
● Denodo enables the use of all together plus other data
sources
○ In a logical data warehouse
○ In a logical data lake
○ They are very similar, the only difference is in the main
objective
● There are also use cases where Denodo can be used as data
source in a ETL flow
Watch Paul's session from Fast Data Strategy on-demand here: https://goo.gl/3veKqw
"Through 2020, 50% of enterprises will implement some form of data virtualization as one enterprise production option for data integration" according to Gartner. It is clear that data virtualization has become a driving force for companies to implement an agile, real-time and flexible enterprise data architecture.
Attend this session to learn:
• What data virtualization actually means and how it differs from traditional data integration approaches
• The most important use cases and key patterns of data virtualization
• The benefits of data virtualization
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...Denodo
Watch full webinar here: https://bit.ly/3xj6fnm
Presented at Chief Data Officer Live 2021 A/NZ
The world is changing faster than ever. And for companies to compete and succeed they need to be agile in order to respond quickly to market changes and emerging opportunities. Data plays an integral role in achieving this business agility. However, given the complex nature of the enterprise data architecture finding and analysing data is an increasingly challenging task. Data virtualization is a modern data integration technique that integrates data in real-time, without having to physically replicate it.
Watch on-demand this session to understand what data virtualization is and how it:
- Delivers data in real-time, and without replication
- Creates a logical architecture to provide a single view of truth
- Centralises the data governance and security framework
- Democratises data for faster decision making and business agility
Bridging the Last Mile: Getting Data to the People Who Need ItDenodo
Watch full webinar here: https://bit.ly/3cUA0Qi
Many organizations are embarking on strategically important journeys to embrace data and analytics. The goal can be to improve internal efficiencies, improve the customer experience, drive new business models and revenue streams, or – in the public sector – provide better services. All of these goals require empowering employees to act on data and analytics and to make data-driven decisions. However, getting data – the right data at the right time – to these employees is a huge challenge and traditional technologies and data architectures are simply not up to this task. This webinar will look at how organizations are using Data Virtualization to quickly and efficiently get data to the people that need it.
Attend this session to learn:
- The challenges organizations face when trying to get data to the business users in a timely manner
- How Data Virtualization can accelerate time-to-value for an organization’s data assets
- Examples of leading companies that used data virtualization to get the right data to the users at the right time
Virtualisation de données : Enjeux, Usages & BénéficesDenodo
Watch full webinar here: https://bit.ly/3oah4ng
Gartner a récemment qualifié la Data Virtualisation comme étant une pièce maitresse des architectures d’intégration de données.
Découvrez :
- Les bénéfices d’une plateforme de virtualisation de données
- La multiplication des usages : Lakehouse, Data Science, Big Data, Data Service & IoT
- La création d’une vue unifiée de votre patrimoine de données sans transiger sur la performance
- La construction d’une architecture d’intégration Agile des données : on-premise, dans le cloud ou hybride
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)Denodo
Watch full webinar here: https://bit.ly/3aePFcF
Historically data lakes have been created as a centralized physical data storage platform for data scientists to analyze data. But lately the explosion of big data, data privacy rules, departmental restrictions among many other things have made the centralized data repository approach less feasible. In this webinar, we will discuss why decentralized multipurpose data lakes are the future of data analysis for a broad range of business users.
Attend this session to learn:
- The restrictions of physical single purpose data lakes
- How to build a logical multi purpose data lake for business users
- The newer use cases that makes multi purpose data lakes a necessity
A Logical Architecture is Always a Flexible Architecture (ASEAN)Denodo
Watch full webinar here: https://bit.ly/3joZa0a
The current data landscape is fragmented, not just in location but also in terms of processing paradigms: data lakes, IoT architectures, NoSQL, and graph data stores, SaaS applications, etc. are found coexisting with relational databases to fuel the needs of modern analytics, ML, and AI. The physical consolidation of enterprise data into a central repository, although possible, is both expensive and time-consuming. A logical data warehouse is a modern data architecture that allows organizations to leverage all of their data irrespective of where the data is stored, what format it is stored in, and what technologies or protocols are used to store and access the data.
Watch this session to understand:
- What is a logical data warehouse and how to architect one
- The benefits of logical data warehouse – speed with agility
- Customer use case depicting logical architecture implementation
Multi-Cloud Integration with Data Virtualization (ASEAN)Denodo
Watch full webinar here: https://bit.ly/3corOL4
More and more organizations are adopting multi-cloud strategies to provide greater flexibility, cost savings, and performance optimization. Even when organizations commit to a single cloud provider, they often have data and applications spread across different cloud regions to support different business units or geographies. The result of this is a highly distributed infrastructure that makes finding and accessing the data needed for reporting and analytics even more challenging.
The Denodo Platform Multi-Location Architecture provides quick and easy managed access to data while still providing local control to the 'data owners' and complying with local privacy and data protection regulations (think GDPR and CCPA).
In this on-demand webinar, you will learn about:
- The challenges facing organizations as they adopt multi-cloud data strategies
- How the Denodo Platform provides a managed data access layer across the organization
- The different multi-location architectures that can maximize local control over data while still making it readily available
- How organizations have benefited from using the Denodo Platform as a multi-cloud data access layer
How a Logical Data Fabric Enhances the Customer 360 ViewDenodo
Watch full webinar here: https://bit.ly/3GI802M
Organisations have struggled for years in understanding their customers, this has mainly been due to not having the right data available at the right point in time. In this session we will discuss the role of Data Virtualization in providing customer 360 degree view and look at some of the success stories our customers have told us about.
An overview of datonixOne a new, evolutionary and effective Data Preparation Platform.
We do introduce a new disruptive technology into the Data Management Space, it is the Data Scannew.
Using the Data Scanner Data Science is more accurate and feasible.
datonixOne is a perfect Satellite of any Enterprise Data Hub.
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Denodo
Watch full webinar here: https://bit.ly/34iCruM
Many organizations are embarking on strategically important journeys to embrace data and analytics. The goal can be to improve internal efficiencies, improve the customer experience, drive new business models and revenue streams, or – in the public sector – provide better services. All of these goals require empowering employees to act on data and analytics and to make data-driven decisions. However, getting data – the right data at the right time – to these employees is a huge challenge and traditional technologies and data architectures are simply not up to this task. This webinar will look at how organizations are using Data Virtualization to quickly and efficiently get data to the people that need it.
Attend this session to learn:
- The challenges organizations face when trying to get data to the business users in a timely manner
- How Data Virtualization can accelerate time-to-value for an organization’s data assets
- Examples of leading companies that used data virtualization to get the right data to the users at the right time
Using Data Platforms That Are Fit-For-PurposeDATAVERSITY
We must grow the data capabilities of our organization to fully deal with the many and varied forms of data. This cannot be accomplished without an intense focus on the many and growing technical bases that can be used to store, view, and manage data. There are many, now more than ever, that have merit in organizations today.
This session sorts out the valuable data stores, how they work, what workloads they are good for, and how to build the data foundation for a modern competitive enterprise.
Speaking to your data is similar to speak any other language, It starts with understanding the basic terminology and describing key concepts. This presentation will focus on the main/ key steps that are critical to learning the foundation of speaking data.
Enabling a Data Mesh Architecture with Data VirtualizationDenodo
Watch full webinar here: https://bit.ly/3rwWhyv
The Data Mesh architectural design was first proposed in 2019 by Zhamak Dehghani, principal technology consultant at Thoughtworks, a technology company that is closely associated with the development of distributed agile methodology. A data mesh is a distributed, de-centralized data infrastructure in which multiple autonomous domains manage and expose their own data, called “data products,” to the rest of the organization.
Organizations leverage data mesh architecture when they experience shortcomings in highly centralized architectures, such as the lack domain-specific expertise in data teams, the inflexibility of centralized data repositories in meeting the specific needs of different departments within large organizations, and the slow nature of centralized data infrastructures in provisioning data and responding to changes.
In this session, Pablo Alvarez, Global Director of Product Management at Denodo, explains how data virtualization is your best bet for implementing an effective data mesh architecture.
You will learn:
- How data mesh architecture not only enables better performance and agility, but also self-service data access
- The requirements for “data products” in the data mesh world, and how data virtualization supports them
- How data virtualization enables domains in a data mesh to be truly autonomous
- Why a data lake is not automatically a data mesh
- How to implement a simple, functional data mesh architecture using data virtualization
From Single Purpose to Multi Purpose Data Lakes - Broadening End UsersDenodo
Watch full webinar here: https://buff.ly/2Mt555e
Historically data lakes have been created as centralized physical data storage platform for data scientists to analyze data. But lately the explosion of big data, data privacy rules, departmental restrictions among many other things have made the centralized data repository approach less feasible. In his recent whitepaper, renowned analyst Rick F. Van Der Lans talks about why decentralized multi purpose data lakes are the future of data analysis for a broad range of business users.
Please attend this session to learn:
• The restrictions of physical single purpose data lakes
• How to build a logical multi purpose data lake for business users
• The newer use cases that makes multi purpose data lakes a necessity
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
Data Mesh is an innovative concept addressing many data challenges from an architectural, cultural, and organizational perspective. But is the world ready to implement Data Mesh?
In this session, we will review the importance of core Data Mesh principles, what they can offer, and when it is a good idea to try a Data Mesh architecture. We will discuss common challenges with implementation of Data Mesh systems and focus on the role of open-source projects for it. Projects like Apache Spark can play a key part in standardized infrastructure platform implementation of Data Mesh. We will examine the landscape of useful data engineering open-source projects to utilize in several areas of a Data Mesh system in practice, along with an architectural example. We will touch on what work (culture, tools, mindset) needs to be done to ensure Data Mesh is more accessible for engineers in the industry.
The audience will leave with a good understanding of the benefits of Data Mesh architecture, common challenges, and the role of Apache Spark and other open-source projects for its implementation in real systems.
This session is targeted for architects, decision-makers, data-engineers, and system designers.
Similar to DataGraft: Data-as-a-Service for Open Data (20)
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfGetInData
Recently we have observed the rise of open-source Large Language Models (LLMs) that are community-driven or developed by the AI market leaders, such as Meta (Llama3), Databricks (DBRX) and Snowflake (Arctic). On the other hand, there is a growth in interest in specialized, carefully fine-tuned yet relatively small models that can efficiently assist programmers in day-to-day tasks. Finally, Retrieval-Augmented Generation (RAG) architectures have gained a lot of traction as the preferred approach for LLMs context and prompt augmentation for building conversational SQL data copilots, code copilots and chatbots.
In this presentation, we will show how we built upon these three concepts a robust Data Copilot that can help to democratize access to company data assets and boost performance of everyone working with data platforms.
Why do we need yet another (open-source ) Copilot?
How can we build one?
Architecture and evaluation
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
2. Outline
• What is DataGraft
• DataGraft in SmartOpenData
– TRAGSA and ARPA Data Publishing
• DataGraft for Property Data
2
3. Developed to allow
data workers
to manage their data in a
simple, effective, and efficient way
Powerful
data transformation and
reliable data access capabilities
3
4. Data Transformation and
RDF Publication Process
• Interactive design of transformations?
• Repeatable transformations?
• Reuse/share transformations (user-based access)?
• Cloud-based deployment of transformations?
• Self-serviced process?
• Data and Transformation as-a-Service? 4
6. DataGraft key feature:
Flexible management and sharing of data
and transformations
Fork, reuse and extend
transformations built by other
professionals from DataGraft’s
transformations catalog
Interactively build,
modify and share data
transformations
Share transformations
privately or publicly
Reuse transformations to
repeatably clean and
transform spreadsheet
data
Programmatically access transformations
and the transformation catalogue
6
7. DataGraft key feature:
Reliable data hosting and querying services
Host data on
DataGraft’s reliable,
cloud-based triplestore
Share data privately or
publicly
Query data through
your own SPARQL
endpoint
Programmatically
access the data
catalogue
7
17. DataGraft in SmOD: Use Cases
TRAGSA Pilot
• Number of
transformations: 42
– Created via reuse: 25
• Number of triples:
– ~ 7.7M
ARPA Pilot
• Number of
transformations: 5
– Created via reuse: 2
• Number of triples:
– ~ 14K
17
18. DataGraft in SmOD: Preliminary observations
• Positive aspects
– Forking/reusing transformations helped us spend less time on creating new
transformations
– Possibility to edit parameters of each transformation step and change step order
at any moment of creating the transformation made it easier to:
o Create transformations in general
o Correct mistakes made during transformation steps
o Try the effects of transformation steps with different parameters
– Custom code as utility functions provided flexibility in reuse of functions across
transformations
• Cleaning data lacked some "nice to have" functionality, e.g. joining
or sorting datasets
– This was overcome with some preprocessing of the input files (e.g. 27 of 43 files
needed some initial preprocessing in the TRAGSA pilot)
18
19. DataGraft for Property Data
Why property data?
One of the most valuable datasets managed by
governments worldwide
Extensively used in various domains by private and
public organizations
19
20. Some challenges in working with
property data
• Difficult to access
• Cross-sectors
• Data is highly heterogeneous and possibly large
• Data quality
• Time-consuming integration
• Lack of innovation
• …
http://prodatamarket.eu 20
21. DataGraft – 1 package 2 audiences
DataGraft
Data Publisher Application Developer
Helping
publishing open
data
Giving better,
easier tools
21
22. DataGraft – targeted impacts
Reduction in costs
for organisations (e.g. SMEs, public
organizations, etc.) which lack
sufficient expertise and resources to
publish open data
Reduction on the dependency
of open data publishers on generic Cloud
platforms to build, deploy and maintain
their open/linked data from scratch
Increase in the speed of
publishing
new datasets and updating existing
datasets
Reduction in the cost and
complexity of developing
applications that use open data
Increase in the reuse of open data
by providing reliable access to numerous open
data sets to the applications hosted on
DataGraft.net 22
23. Summary
• DataGraft – emerging solution (as-a-Service) for
making Open (Linked) Data more accessible
– Platform, portal, methodology, APIs
– Developed/Operated by DaPaaS, with contributions from
SmOD, proDataMarket, OpenCube
– Successfully applied in SmOD for two pilot cases
• Key features:
– Support for Sharable/Repeatable/Reusable Data
Transformations
– Reliable RDF Database-as-a-Service
23