This document discusses current themes and trends in data-centric architecture. It identifies several inhibitors to widespread data sharing across organizations, including data feudalism and regulatory challenges. Emerging opportunities include using semantic standards and knowledge graphs to enable digital twins and autonomous agents through interoperable data collaboration. Adopting decentralized architectures and federated storage models can further break down data silos. However, widespread adoption faces obstacles including organizations needing to become truly data-centric first.
Want to see a high-level overview of the products in the Microsoft data platform portfolio in Azure? I’ll cover products in the categories of OLTP, OLAP, data warehouse, storage, data transport, data prep, data lake, IaaS, PaaS, SMP/MPP, NoSQL, Hadoop, open source, reporting, machine learning, and AI. It’s a lot to digest but I’ll categorize the products and discuss their use cases to help you narrow down the best products for the solution you want to build.
Graph technology has truly burst onto the scene with diverse new products and services, proving that graph is relevant and that not all graph use cases are equal. Previously relegated to niche implementations and science projects, graph now finds itself deployed as the foundational technology for enterprise analytics solutions and enterprise Data Fabric strategies. It is no surprise that many are calling 2018 “The Year of the Graph”.
Transformer Architectures in Vision
[2018 ICML] Image Transformer
[2019 CVPR] Video Action Transformer Network
[2020 ECCV] End-to-End Object Detection with Transformers
[2021 ICLR] An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Modern Data Warehousing with the Microsoft Analytics Platform SystemJames Serra
The traditional data warehouse has served us well for many years, but new trends are causing it to break in four different ways: data growth, fast query expectations from users, non-relational/unstructured data, and cloud-born data. How can you prevent this from happening? Enter the modern data warehouse, which is able to handle and excel with these new trends. It handles all types of data (Hadoop), provides a way to easily interface with all these types of data (PolyBase), and can handle “big data” and provide fast queries. Is there one appliance that can support this modern data warehouse? Yes! It is the Analytics Platform System (APS) from Microsoft (formally called Parallel Data Warehouse or PDW) , which is a Massively Parallel Processing (MPP) appliance that has been recently updated (v2 AU1). In this session I will dig into the details of the modern data warehouse and APS. I will give an overview of the APS hardware and software architecture, identify what makes APS different, and demonstrate the increased performance. In addition I will discuss how Hadoop, HDInsight, and PolyBase fit into this new modern data warehouse.
Want to see a high-level overview of the products in the Microsoft data platform portfolio in Azure? I’ll cover products in the categories of OLTP, OLAP, data warehouse, storage, data transport, data prep, data lake, IaaS, PaaS, SMP/MPP, NoSQL, Hadoop, open source, reporting, machine learning, and AI. It’s a lot to digest but I’ll categorize the products and discuss their use cases to help you narrow down the best products for the solution you want to build.
Graph technology has truly burst onto the scene with diverse new products and services, proving that graph is relevant and that not all graph use cases are equal. Previously relegated to niche implementations and science projects, graph now finds itself deployed as the foundational technology for enterprise analytics solutions and enterprise Data Fabric strategies. It is no surprise that many are calling 2018 “The Year of the Graph”.
Transformer Architectures in Vision
[2018 ICML] Image Transformer
[2019 CVPR] Video Action Transformer Network
[2020 ECCV] End-to-End Object Detection with Transformers
[2021 ICLR] An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Modern Data Warehousing with the Microsoft Analytics Platform SystemJames Serra
The traditional data warehouse has served us well for many years, but new trends are causing it to break in four different ways: data growth, fast query expectations from users, non-relational/unstructured data, and cloud-born data. How can you prevent this from happening? Enter the modern data warehouse, which is able to handle and excel with these new trends. It handles all types of data (Hadoop), provides a way to easily interface with all these types of data (PolyBase), and can handle “big data” and provide fast queries. Is there one appliance that can support this modern data warehouse? Yes! It is the Analytics Platform System (APS) from Microsoft (formally called Parallel Data Warehouse or PDW) , which is a Massively Parallel Processing (MPP) appliance that has been recently updated (v2 AU1). In this session I will dig into the details of the modern data warehouse and APS. I will give an overview of the APS hardware and software architecture, identify what makes APS different, and demonstrate the increased performance. In addition I will discuss how Hadoop, HDInsight, and PolyBase fit into this new modern data warehouse.
Azure Synapse Analytics is Azure SQL Data Warehouse evolved: a limitless analytics service, that brings together enterprise data warehousing and Big Data analytics into a single service. It gives you the freedom to query data on your terms, using either serverless on-demand or provisioned resources, at scale. Azure Synapse brings these two worlds together with a unified experience to ingest, prepare, manage, and serve data for immediate business intelligence and machine learning needs. This is a huge deck with lots of screenshots so you can see exactly how it works.
Codetecon #KRK 3 - Object detection with Deep LearningMatthew Opala
There’s been enormous progress in object detection algorithms. Starting from multi-stage ones like R-CNN to end-to-end ones like SSD or YOLO, accuracy of the methods improved significantly. Current applications include pedestrian detection for cars and face detection on facebook.
But that’s just the beginning. I am going to show the algorithms for solving the problem, show what’s currently possible, and what will be possible in the near future.
How to Choose the Right Database for Your WorkloadsInfluxData
Learn how to make the right choice for your workloads with this walkthrough of a set of distinct database types (graph, in-memory, search, columnar, document, relational, key-value, and time series databases). In this webinar, we will review the strengths and qualities of each database type from their particular use-case perspectives.
Data Quality Patterns in the Cloud with Azure Data FactoryMark Kromer
This is my slide presentation from Pragmatic Works' Azure Data Week 2019: Data Quality Patterns in the Cloud with Azure Data Factory using Mapping Data Flows
MLflow is an MLOps tool that enables data scientist to quickly productionize their Machine Learning projects. To achieve this, MLFlow has four major components which are Tracking, Projects, Models, and Registry. MLflow lets you train, reuse, and deploy models with any library and package them into reproducible steps. MLflow is designed to work with any machine learning library and require minimal changes to integrate into an existing codebase. In this session, we will cover the common pain points of machine learning developers such as tracking experiments, reproducibility, deployment tool and model versioning. Ready to get your hands dirty by doing quick ML project using mlflow and release to production to understand the ML-Ops lifecycle.
stackconf 2022: Introduction to Vector Search with WeaviateNETWAYS
In machine learning, e.g., recommendation tools or data classification, data is often represented as high-dimensional vectors. These vectors are stored in so-called vector databases. With vector databases you can efficiently run searching, ranking and recommendation algorithms. Therefore, vector databases became the backbone of ML deployments in industry. This session is all about vector databases. If you are a data scientist or data/software engineer this session would be interesting for you. You will learn how you can easily run your favourite ML models with the vector database Weaviate. You will get an overview of what a vector database like Weaviate can offer: such as semantic search, question answering, data classification, named entity recognition, multimodal search, and much more. After this session, you are able to load in your own data and query it with your preferred ML model!
Session outline
What is a vector database?
You will learn the basic principles of vector databases. How data is stored, retrieved, and how that differs from other database types (SQL, knowledge graphs, etc).
Performing your first semantic search with the vector database Weaviate.
In this phase, you will learn how to set up a Weaviate vector database, how to make a data schema, how to load in data, and how to query data. You can follow along with examples, or you can use your own dataset.
Advanced search with the vector database Weaviate.
Finally, we will cover other functionalities of Weaviate: multi-modal search, data classification, connecting custom ML models, etc.
Organizations are grappling to manually classify and create an inventory for distributed and heterogeneous data assets to deliver value. However, the new Azure service for enterprises – Azure Synapse Analytics is poised to help organizations and fill the gap between data warehouses and data lakes.
Vision Transformer(ViT) / An Image is Worth 16*16 Words: Transformers for Ima...changedaeoh
computer vision 분야에서 dominant 한 Convolutional Layer를 일절 사용하지 않고, NLP에서 제안된 순수 Transformer의 architecture를 그대로 가져와 Attention과 일반 Feed Forward NN만을 이용하여 SOTA수준의 Image Classification Model을 구축한다.
TAVE research seminar 21.03.30 발표자료
발표자: 오창대
Data Lakehouse Symposium | Day 1 | Part 2Databricks
The world of data architecture began with applications. Next came data warehouses. Then text was organized into a data warehouse.
Then one day the world discovered a whole new kind of data that was being generated by organizations. The world found that machines generated data that could be transformed into valuable insights. This was the origin of what is today called the data lakehouse. The evolution of data architecture continues today.
Come listen to industry experts describe this transformation of ordinary data into a data architecture that is invaluable to business. Simply put, organizations that take data architecture seriously are going to be at the forefront of business tomorrow.
This is an educational event.
Several of the authors of the book Building the Data Lakehouse will be presenting at this symposium.
FAIR data_ Superior data visibility and reuse without warehousing.pdfAlan Morrison
The advantages of semantic knowledge graphs over data warehousing when it comes to scaling quality, contextualized data for machine learning and advanced analytics purposes.
Scaling the mirrorworld with knowledge graphsAlan Morrison
After registration at https://www.brighttalk.com/webcast/9273/364148, you can view the full recording, which begins with Scott Abel's intro for a few minutes, then my talk for 20 minutes, and then Sebastian Gabler's. First presented on October 23 at an SWC webinar.
Conclusions:
(1) The mirrorworld (a world of digital twins, which will be 25 years in the making, according to Kevin Kelly) will require semantic knowledge graphs for interaction and interoperability.
(2) This fact implies massive future demand for knowledge graph technology and other new data infrastructure innovations, comparable to the scale of oil & gas industry infrastructure development over 150 years.
(3) Conceivably, knowledge graphs could be used to address a $205 billion market demand by 2021 for graph databases, information management, digital twins, conversational AI, virtual assistants and as knowledge bases/accelerated training for deep learning, etc. but the problem is that awareness of the tech is low, and the semantics community that understands the tech is still quite small.
(4) Over the next decades, knowledge graphs promise both scalability and substantial efficiencies in enterprises. But lack of awareness of its potential and how to harness it will continue to be stumbling blocks to adoption.
Azure Synapse Analytics is Azure SQL Data Warehouse evolved: a limitless analytics service, that brings together enterprise data warehousing and Big Data analytics into a single service. It gives you the freedom to query data on your terms, using either serverless on-demand or provisioned resources, at scale. Azure Synapse brings these two worlds together with a unified experience to ingest, prepare, manage, and serve data for immediate business intelligence and machine learning needs. This is a huge deck with lots of screenshots so you can see exactly how it works.
Codetecon #KRK 3 - Object detection with Deep LearningMatthew Opala
There’s been enormous progress in object detection algorithms. Starting from multi-stage ones like R-CNN to end-to-end ones like SSD or YOLO, accuracy of the methods improved significantly. Current applications include pedestrian detection for cars and face detection on facebook.
But that’s just the beginning. I am going to show the algorithms for solving the problem, show what’s currently possible, and what will be possible in the near future.
How to Choose the Right Database for Your WorkloadsInfluxData
Learn how to make the right choice for your workloads with this walkthrough of a set of distinct database types (graph, in-memory, search, columnar, document, relational, key-value, and time series databases). In this webinar, we will review the strengths and qualities of each database type from their particular use-case perspectives.
Data Quality Patterns in the Cloud with Azure Data FactoryMark Kromer
This is my slide presentation from Pragmatic Works' Azure Data Week 2019: Data Quality Patterns in the Cloud with Azure Data Factory using Mapping Data Flows
MLflow is an MLOps tool that enables data scientist to quickly productionize their Machine Learning projects. To achieve this, MLFlow has four major components which are Tracking, Projects, Models, and Registry. MLflow lets you train, reuse, and deploy models with any library and package them into reproducible steps. MLflow is designed to work with any machine learning library and require minimal changes to integrate into an existing codebase. In this session, we will cover the common pain points of machine learning developers such as tracking experiments, reproducibility, deployment tool and model versioning. Ready to get your hands dirty by doing quick ML project using mlflow and release to production to understand the ML-Ops lifecycle.
stackconf 2022: Introduction to Vector Search with WeaviateNETWAYS
In machine learning, e.g., recommendation tools or data classification, data is often represented as high-dimensional vectors. These vectors are stored in so-called vector databases. With vector databases you can efficiently run searching, ranking and recommendation algorithms. Therefore, vector databases became the backbone of ML deployments in industry. This session is all about vector databases. If you are a data scientist or data/software engineer this session would be interesting for you. You will learn how you can easily run your favourite ML models with the vector database Weaviate. You will get an overview of what a vector database like Weaviate can offer: such as semantic search, question answering, data classification, named entity recognition, multimodal search, and much more. After this session, you are able to load in your own data and query it with your preferred ML model!
Session outline
What is a vector database?
You will learn the basic principles of vector databases. How data is stored, retrieved, and how that differs from other database types (SQL, knowledge graphs, etc).
Performing your first semantic search with the vector database Weaviate.
In this phase, you will learn how to set up a Weaviate vector database, how to make a data schema, how to load in data, and how to query data. You can follow along with examples, or you can use your own dataset.
Advanced search with the vector database Weaviate.
Finally, we will cover other functionalities of Weaviate: multi-modal search, data classification, connecting custom ML models, etc.
Organizations are grappling to manually classify and create an inventory for distributed and heterogeneous data assets to deliver value. However, the new Azure service for enterprises – Azure Synapse Analytics is poised to help organizations and fill the gap between data warehouses and data lakes.
Vision Transformer(ViT) / An Image is Worth 16*16 Words: Transformers for Ima...changedaeoh
computer vision 분야에서 dominant 한 Convolutional Layer를 일절 사용하지 않고, NLP에서 제안된 순수 Transformer의 architecture를 그대로 가져와 Attention과 일반 Feed Forward NN만을 이용하여 SOTA수준의 Image Classification Model을 구축한다.
TAVE research seminar 21.03.30 발표자료
발표자: 오창대
Data Lakehouse Symposium | Day 1 | Part 2Databricks
The world of data architecture began with applications. Next came data warehouses. Then text was organized into a data warehouse.
Then one day the world discovered a whole new kind of data that was being generated by organizations. The world found that machines generated data that could be transformed into valuable insights. This was the origin of what is today called the data lakehouse. The evolution of data architecture continues today.
Come listen to industry experts describe this transformation of ordinary data into a data architecture that is invaluable to business. Simply put, organizations that take data architecture seriously are going to be at the forefront of business tomorrow.
This is an educational event.
Several of the authors of the book Building the Data Lakehouse will be presenting at this symposium.
FAIR data_ Superior data visibility and reuse without warehousing.pdfAlan Morrison
The advantages of semantic knowledge graphs over data warehousing when it comes to scaling quality, contextualized data for machine learning and advanced analytics purposes.
Scaling the mirrorworld with knowledge graphsAlan Morrison
After registration at https://www.brighttalk.com/webcast/9273/364148, you can view the full recording, which begins with Scott Abel's intro for a few minutes, then my talk for 20 minutes, and then Sebastian Gabler's. First presented on October 23 at an SWC webinar.
Conclusions:
(1) The mirrorworld (a world of digital twins, which will be 25 years in the making, according to Kevin Kelly) will require semantic knowledge graphs for interaction and interoperability.
(2) This fact implies massive future demand for knowledge graph technology and other new data infrastructure innovations, comparable to the scale of oil & gas industry infrastructure development over 150 years.
(3) Conceivably, knowledge graphs could be used to address a $205 billion market demand by 2021 for graph databases, information management, digital twins, conversational AI, virtual assistants and as knowledge bases/accelerated training for deep learning, etc. but the problem is that awareness of the tech is low, and the semantics community that understands the tech is still quite small.
(4) Over the next decades, knowledge graphs promise both scalability and substantial efficiencies in enterprises. But lack of awareness of its potential and how to harness it will continue to be stumbling blocks to adoption.
Myth Busters VII: I’m building a data mesh, so I don’t need data virtualizationDenodo
Watch full webinar here: https://bit.ly/3DBA4EP
A data mesh architecture offers a lot of promise to change the way we manage data – and for the better. But there’s a lot of confusion about a data mesh. People will tell you that you can build a data mesh on top of a data lake or on top of a data warehouse, and that you don’t need data virtualization to build a data mesh.
Many vendors are jumping on to the data mesh bandwagon and are claiming that they inherently support a data mesh architecture. But do they? How much of this is hype versus reality? Is it true that you don’t need data virtualization to build a scalable, enterprise-grade data mesh?
This is the myth we will attempt to bust in this next Myth Busters webinar.
Watch this session on-demand to learn about the concepts and components of a data mesh, and hear how the logical approach to data management and integration – powered by data virtualization - is critical for a data mesh.
This Document Includes lecture/workshop notes for BIG DATA SCIENCE workshop at NTI 6-7th of Dec 2017
Hint: 1:This is an Initial Version, and it will be updated.
2: Telecommunication/5G parts were not covered through the workshop, although, I will add a comprehensive analysis regarding mentioned cases.
If anyone is interesting in working practically (HANDS ON) mentioned case study, just drop me an e-mail: m.rahm7n@gmail.com
Agile Data Management with Enterprise Data Fabric (ASEAN)Denodo
Watch full webinar here: https://bit.ly/3juxqaw
In a world where machine learning and artificial intelligence are changing our everyday lives, digital transformation tops the strategic agenda in many private and government organizations. Data is becoming the lifeblood of a company, flowing seamlessly through it to enable deep business insights, create new opportunities, and optimize operations.
Chief Data Officers and Data Architects are under continuous pressure to find the best ways to manage the overwhelming volumes of the data that tend to become more and more distributed and diverse.
Moving data physically to a single location for reporting and analytics is not an option anymore – this is the fact accepted by the majority of the data professionals.
Join us for this webinar to know about the modern virtual data landscapes including:
- Virtual Data Fabric
- Data Mesh
- Multi-Cloud Hybrid architecture
- and to learn how to leverage the Denodo Data Virtualization platform to implement these modern data architectures.
Ever wonder how these concepts contrast with and yet complement each other in a next-generation system?
Enterprise semantics
Knowledge graphs
Model-driven development
Digital twins
Self-Sovereign Identity
Own your own data
Data deduplication
Autonomous agents
Large language systems
Data-Centric Architecture combines the major technologies behind each of these concepts. In fact, it’s essential to the real-world implementation of general AI, enabling the context that’s behind contextual computing, DARPA’s Third Phase of AI. To be able to deliver, DCA needs to simplify and scale data ecosystems using these pieces of the data ecosystem puzzle.
This talk will provide an overview of how these pieces of the data-centric puzzle are fitting together. It’s a best practice to see these pieces can fit together side-by-size in an enterprise context and envision next-gen systems from the viewpoint of some of the most demanding enterprise use cases.
It’s also best practice to study how one industry vertical is moving ahead and contrast that progress with your own industry. Remember, as the data-centric ecosystem emerges and the benefits of true digitization start to pay off, many more techniques can be borrowed from other verticals and used in your own vertical. This talk will summarize several powerful recent case studies and highlight the key takeaways.
Data centric business and knowledge graph trendsAlan Morrison
The deck for my kickoff keynote at the Data-Centric Architecture Forum, February 3, 2020. Includes related data, content, and architecture definitions and fundamental explanations, knowledge graph trends, market outlook, transformation case studies and benefits of large-scale, cross-boundary integration/interoperation.
DAMA Webinar: Turn Grand Designs into a Reality with Data VirtualizationDenodo
Watch full webinar here: https://buff.ly/2HMdbUp
What started to evolve as the most agile and real-time enterprise data fabric, data virtualization is proving to go beyond its initial promise and is becoming one of the most important enterprise big data fabrics.
Attend this session to learn:
• What data virtualization really is,
• How it differs from other enterprise data integration technologies
• Real-world examples of data virtualization in action from companies such as Logitech, Autodesk and Festo.
3 Reasons Data Virtualization Matters in Your PortfolioDenodo
Watch the full session on-demand here: https://goo.gl/upxC5W
Real-Time Analytics for Big Data, Cloud & Self-Service BI
The world of data is only becoming distributed. Privacy, regulations, and the need for real-time decisions are challenging organizations’ legacy information strategy. This webinar will include an expert panel discussion on Logical Data Warehouse, Universal Semantic Layer, and Real-time Analytics by Paul Moxon (VP of Data Architectures), Pablo Alvarez (Director of Product Management), and Alberto Pan (CTO).
Attend and learn:
• The major challenges of legacy information strategies.
• How data virtualization can help you overcome these challenges.
• Strategies for enabling agile data management and analytics.
Five Key Focus Areas for New-Age CollaborationCognizant
Virtually any enterprise can benefit greatly from deploying new-age organizations' collaboration technologies and systems. We review the tools and activities available for collaborating on five levels: within a program, within an organization, between channels, within the extended enterprise and with the external world.
The FAIR data movement and 22 Feb 2023.pdfAlan Morrison
To realize the promise of FAIR data, companies must be data mature. They must adopt data-centric architecture and the #FAIR (findable, accessible, interoperable and reusable) principles. When they do, the data they need will be linked and self-describing. The data when queried will tell you where it is.
A desiloed, #semantic graph data abstraction--the only feasible means behind creating FAIR data at this point--is not only the means to data discovery, but also a path to model-driven development and data sharing at scale, both of which will break an organization's habit of duplicating data and logic.
This webinar highlights fresh enterprise case studies that are starting to realize the dream of #FAIRdata, as well as how these companies are succeeding:
- Zero copy integration: How to think about eliminating #dataduplication and stop the application buying binge that only exacerbates the problem.
- Dynamic, unified data model: Standard graphs provide a means of modeling once, use anywhere, for conceptual, logical and physical purposes all at once.
- Persuasion and teamwork: The #graph approach provides an ideal way to loop business units and domain experts in and empower them to recommend model changes that are easily implemented.
The whole process is bringing #enterprises like Walmart, Uber, Goldman Sachs and Nokia into the age of #contextualcomputing. Learn how to be a fast follower by thinking big, but starting small.
• Intelligent objects introduce a new vision for strengthening communication, relationship and business. Each system must be able to communicate with humans and non humans and its capabilities bring scalability, adaptability, flexibility and greater efficiency.
• IoT generates a big growth of complexity and a new method is necessary to correctly design app.
• We have developed on the field the method Here&Now to permit to manage contextual, liquid, intelligent and connected applications. This means designing software with a level of new cognitive artificial intelligence able to deploy applications that have a level of understanding depending on context; it learns from events and have some level of autonomy with respect to routine activities.
Graph Foundations for Advanced Analytics and CollaborationAlan Morrison
Presentation on Knowledge Graph Foundations and how they're used.
Presented at TechTarget ML and AI Summit
September 20, 2022
View the full video recording of this deck at https://www.brighttalk.com/webcast/9059/556690
Dcaf transformation & kg adoption 2022 -alan morrisonAlan Morrison
A keynote presentation on knowledge graph adoption trends and how to do digital transformation differently.
Delivered at the Enterprise Data Transformation & Knowledge Graph Adoption
A Semantic Arts DCAF Event
February 28, 2022
Data-centric design and the knowledge graphAlan Morrison
The #knowledgegraph--smart data that can describe your business and its domains--is now eating software. We won't be able to scale AI or other emerging tech without knowledge graphs, because those techs all require a transformed data foundation, large-scale integration, and shared data infrastructure.
Key to knowledge graphs are #semantics, #graphdatabase technology and a Tinker Toy-style approach to adding the missing verbs (which provide connections and context) back into your data. A knowledge graph foundation provides a means of contextualizing business domains, your content and other data, for #AI at scale.
This is from a talk I gave at the Data Centric Design for SMART DATA & CONTENT Enthusiasts meetup on July 31, 2019 at PwC Chicago. Thanks to Mary Yurkovic and Matt Turner for a very fun event!.
Data-Centric Business Transformation Using Knowledge GraphsAlan Morrison
From a talk at the Data Architecture Summit in Chicago in 2018--reviews digital transformation and what deep transformation really implies at the data layer. Cross-enterprise knowledge graphs are becoming feasible and can be a key enabler of deep transformation.
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Adjusting primitives for graph : SHORT REPORT / NOTES
DCAF 2023 1 and 2.pdf
1. DCA: Current Themes and
Trends*
Alan Morrison
Data-Centric Architecture Forum
May 2023
1
Alain
Audet
at
https://pixabay.com/photos/lake-foggy-lake-nature-landscape-6839357/
*Separate talk to cover NLP/LLMs
2. Business goals enabled by a connected, shared data
ecosystem
2
Buying Helping
Making Selling
Sharing
3. Inhibitors to ecosystem-level sharing
● Data feudalism
● Poorly defined regulatory challenges
● Weak public sector
● Public apathy
● Technology + investor inertia and lack of clear vision
● Magic bullet syndrome
● Media groupthink
● Idol worship
● Pervasive myopia
● Lack of organization fox empowerment over hedgehogs
3
4. Unclaimed data market territory
FAIR*
Actionability
Immediacy
Divining purpose
Divining intent
Synthesis
Reasoning
Abstraction
Contextualization
Connection
Classification
Identification
Unclaimed market territory
Staked claims
Present vs Future Shared Data Market Map
12
steps
to
FAIR
data
power
*Findable, accessible, interoperable, reusable data
Reach of
current ML
efforts
7. Opportunity: Unitary data + description logic = knowledge
7
“Data management” (structured data,
mostly)
Knowledge management (internally
shared)
Content management (externally
shared)
Learning management (internal
coursework)
FAIR data and
associated
description
logic
FAIR data is data users can
have confidence in for
many purposes.
Data becomes FAIR when
it disambiguates concepts,
individuals and roles and
how they interact and relate
to one another.
In a knowledge graph
context, documented
knowledge = FAIR data.
Under the FAIR data umbrella are all heterogeneous
types of data/content.
8. To create a knowledge graph, users can start with a single triple
8
Linked Open Data Cloud, 2022
Starter triple for a knowledge graph
A standard knowledge graph consists of triplified, relationship-rich
data. The data model, or ontology, is also described in triples and
lives with the rest of the data. Ontologies can also be managed as
data. Linking triples merely requires a verb (or predicate, or
described edge) to link them.
9. Simple way to start a business knowledge graph (besides using gist)
● “Use JSON-LD to atomise your enterprise data down into three-part statements and voila!
You get a connected graph!
● ✨ Decentralize the process by having each team publish their own JSON-LD, for example,
let the sales team publish the sales data and ask them to link each sale to the correct product
and client.
● 🤖 Connect GPT to the JSON-LD that your teams have published. Then, harness the power
of GPT to assist new teams in publishing their JSON-LD and integrating it back into your
enterprise-wide Knowledge Graph.”
Key to scaling external/internal integration: use the schema.org modeled JSON-LD from websites
GPT is trained on and connect it with internal data also modeled with schema.org
–#HT Tony Seale, UBS
https://www.linkedin.com/posts/tonyseale_mlops-dataintegration-ai-activity-7052551060237819904-bAZc
9
10. Yes, data warehousing focused on the integration problem
10
● Pro: Identified the critical problem to solve
● Con: Advocated a method that doesn’t delve deep enough to solve today’s
problem
● Still face the unified data model challenge
11. No, data warehousing model conformance doesn’t scale
“I spent a good 15 years working in financial services at some
pretty big banks. Half of the IT change budget is spent on
integration and the by-products of integration….I saw as the
technology was advancing that the percentage wasn’t going
down – in fact, it was going up. At some point, is the integration
tax going to be 100 percent?”
– Dan DeMers, CEO of Cinchy
“Disambiguation of Data Mesh, Fabric, Centric, Driven, and Everything!” YouTube video,
https://www.youtube.com/watch?v=M5XlGloj4UY&t=564s, 2021
11
12. How data warehousing stopped scaling
“They recognized that these themes ended up in all these legacy apps. Sales rolled up against a
geographic and a product hierarchy, and an organizational hierarchy…. They said, Let’s have
those conformed dimensions and a small number of facts. Let’s bring the facts from all the
different systems and snap them together according to these conformed dimensions….
Brilliant idea, but I think what actually happened over time is the workload just got greater and
greater. The ability of people to actually conform those dimensions kept eroding….”
–Dave McComb, President, Semantic Arts
“Disambiguation of Data Mesh, Fabric, Centric, Driven, and Everything!” YouTube video, https://www.youtube.com/watch?v=M5XlGloj4UY&t=564s, 2021
12
13. Data warehousing can’t solve today’s integration challenge
13
● Thousands of databases per enterprise (siloing)
● Thousands of applications (code sprawl)
● Data models buried in the app code
● Every app a special snowflake with its own data model
14. How did we get here? By selling the old as new
14
15. Why large-scale integration?
15
Large scale integration is essential to
avoiding observational bias. The drunk
looking for his money under the lamppost
analogy describes the nature of this bias.
The drunk is looking for his money where
the light is, even though he knows the
money is in the shadows.
To manage today’s business at scale,
enterprises need light and visibility
across departments, organizations and
supply networks
16. Semantic standards allow a desiloed data landscape for
interactive, interoperable digital twins and agents
16
17. Promise of digital twins and agents–way beyond APIs
17
Autonomous agents
Digital twins/
Small KGs
Locale: Portsmouth, UK
Sensor nets
Iotics, 2019
and 2023
18. How shared graph semantics helps
● Boosts meaningful results (result of lack of data and logic transparency and
cohesiveness) and relevancy
● Contextualizes data for management and reuse with relationship logic
● Scales meaningful connections between contexts (relevant relationships
living with entities)
● Enables Metcalfe’s network of networks effect (network_effectN
)
● Enables model-driven development via knowledge graphs (code once, reuse
anywhere)
● Provides access vIa KGs to logic programs as well as heterogeneous, smart data
● Scale efficiencies and economies so that energy consumption is reduced
18
19. KG centricity makes reliable, automated data webs possible
19
Data teams report spending 25-30% of their time cleaning, labelling, and
gathering data sets.... [Some can spend 80% plus]
What we know for sure is that data teams and knowledge workers
generally spend a noteworthy amount of their time procuring data
points that are available on the public web…”
It took Google knowledge panels one month and twenty days to update
following the inception of a new CEO at Citi, a F100 company. In Diffbot’s
Knowledge Graph, a new fact was logged within the week, with zero
human intervention and sourced from the public web.
– Merrill Cook, Diffbot Blog, 2021-2022
20. Example capabilities in Diffbot’s AI automated KG
20
Mike Tung, “VLDB2020: The Diffbot Knowledge Graph,”2020
21. “Decentralization”: Why you should care
● Further desiloing
● More systems federation
● More interorganizational use potential
● Data Centric approach to architecture
● “Decentralized/Web3 stack”
● More storage options and tiering
● Options at different temperatures (hot vs. cold storage) for new use cases
● More captive and independent storage
21
22. Simple web hosting + legacy Client-Server
storage
Early Web (on Client-Server)
Compute and storage more loosely coupled,
virtualized, controlled and data-centric
“Decoupled” and “Decentralized” Cloud
Application Distribution via Proprietary
and IP Networking
Client-Server and Desktops
Commodity servers + storage + some
virtualization
Distributed Cloud and Mobile Devices
1st
2nd
3rd
4th
5th
Centralized storage and compute, with
minimal networking
Mainframe and Green Screens
The Five Commingled Phases of Compute, Networking and Storage
22
Less
centralized
Time
More
centralized
Application
Centric
Data
Centric
All phases are
still active and
evolving
23. Degree of control assumes a continuum–not a binary split
23
See Thomas W. Malone, Inventing the Organizations of the 21st Century, MIT Press, 2003, 45FF.
24. SOLID: Federated storage and decentralized apps
24
Ruben Verborgh, “Decentralizing personal data management with Solid: a hands-on workshop,” SEMIC Workshop, October 2020
25. SOLID shared, federated XaaS: Construction industry
25
“TrinPod™: World's first conceptually indexed space-time
digital twin using Solid,” Graphmetrix, 2022,
https://graphmetrix.com/trinpod
Company-specific SOLID storage pods and access
control can be managed by each supply chain partner.
Graphmetrix as digital twin provider manages the
system and system-level apps.
26. Peergos makes personal file storage management possible via IPFS and a
browser
26
Peergos technology logical architecture, https://peergos.org/technology, 2019
Peergos is a personal data
dcloud storage environment
that also uses blockchain
based decentralized
public-key-infrastructure
(dpki). Consider as an
alternative to Google or
Amazon Photos, for example.
28. OriginTrail + BSI’s supply chain tracking and tracing
28
OriginTrail and the British Standards Institute (BSI), https://twitter.com/origin_trail/status/1339606640887152642?s=20, Dec. 2020
The Monasteriven
whiskey produced in
Ireland is tracked and
traced from “grain to
glass” with the
OriginTrail.io
approach.
OT uses
decentralized
knowledge graph that
connects to one of
several different
blockchains.
This method enables
shared data reuse
and other synergies
across the supply
chain.
29. Seven obstacles to adoption of decentralized,
interorganizational environments
29
32. Thoughts and Reactions?
Feel free to ping me anytime with questions, etc.
Alan Morrison
Data Science Central
LinkedIn | Twitter | Quora | Slideshare
+1 408 205 5109
a.s.morrison@gmail.com
32
33. From NLP, to stochastic parrots,
to neurosymbolic AI
Alan Morrison
Data-Centric Architecture Forum
May 2023
33
34. What’s a “stochastic parrot” and one who worships the same?
“A Language Model is a system for haphazardly stitching together sequences of linguistic
forms it has observed in its vast training data, according to probabilistic information about
how they combine, but without any reference to meaning: a stochastic parrot.”
–Emily Bender, et al., “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?,”
ACM paper presented at FAccT ’21, March 3–10, 2021, virtual event, Canada
Stochastic parrot worshippers: Those who mindlessly praise LLMs without realizing they’ve
mistaken the parrot part—probabilistic language methods alone–for the whole. These
worshippers seem to assume those methods alone will deliver artificial general intelligence.
Related term: Documentation debt (also per Bender, et al.)
“When we rely on ever larger datasets we risk incurring documentation debt,” they say, “i.e.,
putting ourselves in the situation where the datasets are both undocumented and too large to
document post hoc…. The solution, we propose, is to budget for documentation as part of the
planned costs of dataset creation.”
34
36. What’s Natural Language Processing (NLP)?
36
“The root of Natural Language
Processing dates back to the 1950s
when Alan Turing first devised the
Turing Test.
“The objective of the Turing Test was
to determine whether a computer
was truly intelligent based on its
ability to interpret and generate
natural language as a criterion of
intelligence.”
– Tithy Sreemani, Analytics Vidhya
blog, 2022
37. What’s natural language understanding (NLU)?
1. A form of overpromising and underdelivering, or
2. A serious, ongoing linguistics + cognition endeavor to model how human
understanding works.
37
A sentence-level
model based on Role
and Reference
Grammar by PAT
Inc., 2022.
38. What’s a large language model (LLM)?
1. A neural network with many layers (“deep learning”).
2. A transformer model that “learns” context a token at a time, in sequence.
3. A tokenizer that converts words to numbers and numbers to words.
4. A token-to-embedding (vectorization) transformer.
5. An ML model that is trained on very large data sets with millions of billions of
parameters (akin to multi-dimensional topographic features)
6. The NLP (natural language processing) system currently in vogue.
38
40. Solving arithmetic or chasing “facts” with LLMs wastes time and energy
“Suppose that I wanted to find out the square root of five. If I asked an LLM (say ChatGPT), getting this answer involves the
following steps:
● Me: Send a prompt saying “What is the square root of 5?”
● ChatGPT: Do I understand the concept of square root? Yes, I do … it’s a math function.
● ChatGPT: There is a Python function that can be used to invoked that function, in the Python Math Library. Retrieve
that library.
● ChatGPT: Evaluate the number 5 with the function call to get the value 2.235.
● ChatGPT: Construct a response and send that response back to the client.
This assumes that everything goes right.”
– Curt Kagle, The Cagle Report
40
41. Knowledge graphs know; LLMs need prompts and figure it out, sort of .
“LLMs have to figure things out. They follow an iterative feedback loop called a
langchain, with either a human, itself, or a combination of the two. This
langchain model should be emulatable with SPARQL.
“Update. I’m playing around with this idea on Jena/Fuseki, and the early results
are … intriguing. The key is to recognize that you are doing mutations to the
database, which makes many DBAs cringe. However, I don’t think there is any
way you can get to conversational AI on a knowledge graph without constantly
building (and, when necessary, destroying) contextual graphs.”
Kurt Cagle. “Figuring Out vs. Knowing,” The Cagle Report
41
42. Idea: Connect the LLM directly to a KG such as Wikidata
“We can just use the SPARQL query generation ability directly and ask queries
against Wikidata. Not only can we connect the LLM to a knowledge graph, but
also to a repository of functions such as wiki functions.” LLM can learn to use KGs
and functions as tools.”
–Denny Vrandečić, Wikimedia Foundation, 2023
42
43. Each machine learning answer creates some uncertainty
“You can use machine learning to retrieve Obama’s birthplace every time you
need it, but it costs a lot, and you’re never sure it’s correct.”
–Jamie Taylor of Google
43
44. Efficiency argument for knowledge graphs
“Why would you ever use a 96-layer, 156 billion parameter large language model
to do multiplication, when that’s something you can do in a single operation on
your CPU?”
“Why internalize knowledge in an LLM, when you can externalize it in a graph
store and look it up when you need it?”
“Use LLMs where they are efficient.”
– Denny Vrandečić of the Wikimedia Foundation
44
45. To scale FAIR data, use an assisted, hybrid AI approach
45
Amit Sheth, From NLP to NLU: Why we need varied, comprehensive, and stratified knowledge (Neuro-symbolic AI),” USC Information Sciences Institute on
YouTube, March 2023, https://www.youtube.com/watch?v=xyxQXka6dRY&t=2377s
46. 46
How hybrid AI helps in research
“LLMs have amazing abilities in
manipulating natural language text,
but generating timely and factually
verified recommendations is one
thing LLMs are not naturally great
at.”
–Mike Tung, CEO of Diffbot
Diffbot Blog, April 2023,
https://blog.diffbot.com/generating-company-recommendations-usi
ng-large-language-models-and-knowledge-graphs/
LLMs aren’t a reliable research tool
alone because they hallucinate. you
can’t trust the answers unless you know
the answer already.
Mike Tung recommends more precise
prompting on the query side and answer
verification via a knowledge graph such
as Diffbot. Both of these capabilities
harness precise logical description
missing in current LLM Q&As.
47. NLP’s compost grinder data mentality
47
https://pixabay.com/photos/compost-grinder-compost-chipper-3389088/
48. Versus KGs growing naturally in companion plant mode
48
Rich data ecosystems evolve naturally by
comparison with underdescribed, fragmented
data assets
Zero-copy integration becomes possible,
reducing complexity, labor and energy waste by
up to 90 percent
Second-order cybernetics (humans in the loop)
and precise facts and contextualization
complement probabilistic methods
https://www.fruitsaladtrees.com/blogs/news/ediblegarden
49. AI’s Wave III: Less wasteful, more explicit smart data
management via a knowledge graph foundation
49
50. Thoughts and Reactions?
Feel free to ping me anytime with questions, etc.
Alan Morrison
Data Science Central
LinkedIn | Twitter | Quora | Slideshare
+1 408 205 5109
a.s.morrison@gmail.com
50
51. NLP versus NLU: Most true understanding is unclaimed territory
51
Unclaimed data market territory
FAIR*
Actionability
Immediacy
Divining purpose
Divining intent
Synthesis
Reasoning
Abstraction
Contextualization
Connection
Classification
Identification
Unclaimed market territory
Staked claims
Present vs Future Data Market Map
12
steps
to
FAIR
data
power
*Findable, accessible, interoperable, reusable data
Reach of
current ML
efforts
61. Thoughts and Reactions?
Feel free to ping me anytime with questions, etc.
Alan Morrison
Data Science Central
LinkedIn | Twitter | Quora | Slideshare
+1 408 205 5109
a.s.morrison@gmail.com
61