An introduction to the free and open source software for data catalogs, CKAN (Comprehensive Knowledge Archive Network). Presented at the IV Moscow Urban Forum, Russia, in December 2014. http://mosurbanforum.com/forum2014/
CKAN is a powerful data management system that makes data accessible – by providing tools to streamline publishing, sharing, finding and using data. CKAN is aimed at data publishers (national and regional governments, companies and organizations) wanting to make their data open and available.
Minicurso introdutório ao software livre para catálogos de dados CKAN (Comprehensive Knowledge Archive Network). Apresentado na conferência Linked Open Data Brasil 2014, em Florianópolis, SC.
Big Query - Utilizing Google Data Warehouse for Media Analyticshafeeznazri
This topic will cover the intermediate understanding of Google Big Query and how Media Prima Digital utilizing Big Query as data warehouse for production.
Making Data Timelier and More Reliable with Lakehouse TechnologyMatei Zaharia
Enterprise data architectures usually contain many systems—data lakes, message queues, and data warehouses—that data must pass through before it can be analyzed. Each transfer step between systems adds a delay and a potential source of errors. What if we could remove all these steps? In recent years, cloud storage and new open source systems have enabled a radically new architecture: the lakehouse, an ACID transactional layer over cloud storage that can provide streaming, management features, indexing, and high-performance access similar to a data warehouse. Thousands of organizations including the largest Internet companies are now using lakehouses to replace separate data lake, warehouse and streaming systems and deliver high-quality data faster internally. I’ll discuss the key trends and recent advances in this area based on Delta Lake, the most widely used open source lakehouse platform, which was developed at Databricks.
An introduction to the free and open source software for data catalogs, CKAN (Comprehensive Knowledge Archive Network). Presented at the IV Moscow Urban Forum, Russia, in December 2014. http://mosurbanforum.com/forum2014/
CKAN is a powerful data management system that makes data accessible – by providing tools to streamline publishing, sharing, finding and using data. CKAN is aimed at data publishers (national and regional governments, companies and organizations) wanting to make their data open and available.
Minicurso introdutório ao software livre para catálogos de dados CKAN (Comprehensive Knowledge Archive Network). Apresentado na conferência Linked Open Data Brasil 2014, em Florianópolis, SC.
Big Query - Utilizing Google Data Warehouse for Media Analyticshafeeznazri
This topic will cover the intermediate understanding of Google Big Query and how Media Prima Digital utilizing Big Query as data warehouse for production.
Making Data Timelier and More Reliable with Lakehouse TechnologyMatei Zaharia
Enterprise data architectures usually contain many systems—data lakes, message queues, and data warehouses—that data must pass through before it can be analyzed. Each transfer step between systems adds a delay and a potential source of errors. What if we could remove all these steps? In recent years, cloud storage and new open source systems have enabled a radically new architecture: the lakehouse, an ACID transactional layer over cloud storage that can provide streaming, management features, indexing, and high-performance access similar to a data warehouse. Thousands of organizations including the largest Internet companies are now using lakehouses to replace separate data lake, warehouse and streaming systems and deliver high-quality data faster internally. I’ll discuss the key trends and recent advances in this area based on Delta Lake, the most widely used open source lakehouse platform, which was developed at Databricks.
Basic concepts, best practices, pricing of using BigQuery the analytic data platform at petabyte scale from Google Cloud Platform. There is a lot things to learn about this tool and its features such as BI engine and AI Platform.
FAIRy stories: the FAIR Data principles in theory and in practiceCarole Goble
https://ucsb.zoom.us/meeting/register/tZYod-ippz4pHtaJ0d3ERPIFy2QIvKqjwpXR
FAIRy stories: the FAIR Data principles in theory and in practice
The ‘FAIR Guiding Principles for scientific data management and stewardship’ [1] launched a global dialogue within research and policy communities and started a journey to wider accessibility and reusability of data and preparedness for automation-readiness (I am one of the army of authors). Over the past 5 years FAIR has become a movement, a mantra and a methodology for scientific research and increasingly in the commercial and public sector. FAIR is now part of NIH, European Commission and OECD policy. But just figuring out what the FAIR principles really mean and how we implement them has proved more challenging than one might have guessed. To quote the novelist Rick Riordan “Fairness does not mean everyone gets the same. Fairness means everyone gets what they need”.
As a data infrastructure wrangler I lead and participate in projects implementing forms of FAIR in pan-national European biomedical Research Infrastructures. We apply web-based industry-lead approaches like Schema.org; work with big pharma on specialised FAIRification pipelines for legacy data; promote FAIR by Design methodologies and platforms into the researcher lab; and expand the principles of FAIR beyond data to computational workflows and digital objects. Many use Linked Data approaches.
In this talk I’ll use some of these projects to shine some light on the FAIR movement. Spoiler alert: although there are technical issues, the greatest challenges are social. FAIR is a team sport. Knowledge Graphs play a role – not just as consumers of FAIR data but as active contributors. To paraphrase another novelist, “It is a truth universally acknowledged that a Knowledge Graph must be in want of FAIR data.”
[1] Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016). https://doi.org/10.1038/sdata.2016.18
Observability for Data Pipelines With OpenLineageDatabricks
Data is increasingly becoming core to many products. Whether to provide recommendations for users, getting insights on how they use the product, or using machine learning to improve the experience. This creates a critical need for reliable data operations and understanding how data is flowing through our systems. Data pipelines must be auditable, reliable, and run on time. This proves particularly difficult in a constantly changing, fast-paced environment.
Collecting this lineage metadata as data pipelines are running provides an understanding of dependencies between many teams consuming and producing data and how constant changes impact them. It is the underlying foundation that enables the many use cases related to data operations. The OpenLineage project is an API standardizing this metadata across the ecosystem, reducing complexity and duplicate work in collecting lineage information. It enables many projects, consumers of lineage in the ecosystem whether they focus on operations, governance or security.
Marquez is an open source project part of the LF AI & Data foundation which instruments data pipelines to collect lineage and metadata and enable those use cases. It implements the OpenLineage API and provides context by making visible dependencies across organizations and technologies as they change over time.
Data Warehouse or Data Lake, Which Do I Choose?DATAVERSITY
Today’s data-driven companies have a choice to make – where do we store our data? As the move to the cloud continues to be a driving factor, the choice becomes either the data warehouse (Snowflake et al) or the data lake (AWS S3 et al). There are pro’s and con’s for each approach. While the data warehouse will give you strong data management with analytics, they don’t do well with semi-structured and unstructured data with tightly coupled storage and compute, not to mention expensive vendor lock-in. On the other hand, data lakes allow you to store all kinds of data and are extremely affordable, but they’re only meant for storage and by themselves provide no direct value to an organization.
Enter the Open Data Lakehouse, the next evolution of the data stack that gives you the openness and flexibility of the data lake with the key aspects of the data warehouse like management and transaction support.
In this webinar, you’ll hear from Ali LeClerc who will discuss the data landscape and why many companies are moving to an open data lakehouse. Ali will share more perspective on how you should think about what fits best based on your use case and workloads, and how some real world customers are using Presto, a SQL query engine, to bring analytics to the data lakehouse.
Vertex AI - Unified ML Platform for the entire AI workflow on Google CloudMárton Kodok
Vertex AI is a managed ML platform for practitioners to accelerate experiments and deploy AI models.
Enhanced developer experience
- Build with the groundbreaking ML tools that power Google
- Approachable from the non-ML developer perspective (AutoML, managed models, training)
- Ease the life of a data scientist/ML (has feature store, managed datasets, endpoints, notebooks)
- Infrastructure management overhead have been almost completely eliminated
- Unified UI for the entire ML workflow
- End-to-end integration for data and AI with build pipelines that outperform and solve complex ML tasks
- Explainable AI and TensorBoard to visualize and track ML experiments
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies?Kai Wähner
Microservices became the new black in enterprise architectures. APIs provide functions to other applications or end users. Even if your architecture uses another pattern than microservices, like SOA (Service-Oriented Architecture) or Client-Server communication, APIs are used between the different applications and end users.
Apache Kafka plays a key role in modern microservice architectures to build open, scalable, flexible and decoupled real time applications. API Management complements Kafka by providing a way to implement and govern the full life cycle of the APIs.
This session explores how event streaming with Apache Kafka and API Management (including API Gateway and Service Mesh technologies) complement and compete with each other depending on the use case and point of view of the project team. The session concludes exploring the vision of event streaming APIs instead of RPC calls.
Understand how event streaming with Kafka and Confluent complements tools and frameworks such as Kong, Mulesoft, Apigee, Envoy, Istio, Linkerd, Software AG, TIBCO Mashery, IBM, Axway, etc.
A Streaming API Data Exchangeprovides streaming replication between business units and companies. API Management with REST/HTTP is not appropriate for streaming data.
What you need to know about Generative AI and Data Management?Denodo
Watch full webinar here: https://buff.ly/3UXy0A2
It should be no surprise that Generative AI will have a profound impact to data management in years to come. Much like other areas of the technology sector, the opportunities presented by GenAI will accelerate our efforts around all aspects of data management, including self-service, automation, data governance and security. On the other hand, it is also becoming clearer that to unleash the true potential of AI assistants powered by GenAI, we need novel implementation strategies and a reimagined data architecture. This presents an exhilarating yet challenging future, demanding innovative thinking and methodologies in data management.
Join us on this webinar to learn about:
- The opportunities and challenges presented by GenAI today.
- Exploiting GenAI to democratize data management.
- How to augment GenAI applications with corporate data and knowledge.
- How to get started.
This is a run-through at a 200 level of the Microsoft Azure Big Data Analytics for the Cloud data platform based on the Cortana Intelligence Suite offerings.
Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...Databricks
It is widely known that the discovery, development, and commercialization of new classes of drugs can take 10-15 years and greater than $5 billion in R&D investment only to see less than 5% of the drugs make it to market.
AstraZeneca is a global, innovation-driven biopharmaceutical business that focuses on the discovery, development, and commercialization of prescription medicines for some of the world’s most serious diseases. Our scientists have been able to improve our success rate over the past 5 years by moving to a data-driven approach (the “5R”) to help develop better drugs faster, choose the right treatment for a patient and run safer clinical trials.
However, our scientists are still unable to make these decisions with all of the available scientific information at their fingertips. Data is sparse across our company as well as external public databases, every new technology requires a different data processing pipeline and new data comes at an increasing pace. It is often repeated that a new scientific paper appears every 30 seconds, which makes it impossible for any individual expert to keep up-to-date with the pace of scientific discovery.
To help our scientists integrate all of this information and make targeted decisions, we have used Spark on Azure Databricks to build a knowledge graph of biological insights and facts. The graph powers a recommendation system which enables any AZ scientist to generate novel target hypotheses, for any disease, leveraging all of our data.
In this talk, I will describe the applications of our knowledge graph and focus on the Spark pipelines we built to quickly assemble and create projections of the graph from 100s of sources. I will also describe the NLP pipelines we have built – leveraging spacy, bioBERT or snorkel – to reliably extract meaningful relations between entities and add them to our knowledge graph.
DataOps is a methodology and culture shift that brings the successful combination of development and operations (DevOps) to data processing environments. It breaks down silos between developers, data scientists, and operators, resulting in lean data feature development processes with quick feedback. In this presentation, we will explain the methodology, and focus on practical aspects of DataOps.
Watch full webinar here: https://bit.ly/2Y0vudM
What is Data Virtualization and why do I care? In this webinar we intend to help you understand not only what Data Virtualization is but why it's a critical component of any organization's data fabric and how it fits. How data virtualization liberates and empowers your business users via data discovery, data wrangling to generation of reusable reporting objects and data services. Digital transformation demands that we empower all consumers of data within the organization, it also demands agility too. Data Virtualization gives you meaningful access to information that can be shared by a myriad of consumers.
Register to attend this session to learn:
- What is Data Virtualization?
- Why do I need Data Virtualization in my organization?
- How do I implement Data Virtualization in my enterprise?
This is the presentation for the talk I gave at JavaDay Kiev 2015. This is about an evolution of data processing systems from simple ones with single DWH to the complex approaches like Data Lake, Lambda Architecture and Pipeline architecture
What’s New with Databricks Machine LearningDatabricks
In this session, the Databricks product team provides a deeper dive into the machine learning announcements. Join us for a detailed demo that gives you insights into the latest innovations that simplify the ML lifecycle — from preparing data, discovering features, and training and managing models in production.
Drupal, CKAN and Public Data. DrupalGov 08 february 2016Steven De Costa
Main points from this presentation are:
DKAN is not CKAN
CKAN is owning Australian Government
Data.Vic, Data.NSW, Data.SA and Data.Brisbane use Drupal and CKAN together
Single Sign on – https://github.com/ckan/ckanext-drupal7
Taxonomies and CKAN - pulling from CKAN into Drupal to enhance content for Government websites.
Webforms to CKAN - for an 'open data' form collection process.
Resource Views for Drupal - configured for a CKAN portal and orgsanisation.
Telling stories with data...
Basic concepts, best practices, pricing of using BigQuery the analytic data platform at petabyte scale from Google Cloud Platform. There is a lot things to learn about this tool and its features such as BI engine and AI Platform.
FAIRy stories: the FAIR Data principles in theory and in practiceCarole Goble
https://ucsb.zoom.us/meeting/register/tZYod-ippz4pHtaJ0d3ERPIFy2QIvKqjwpXR
FAIRy stories: the FAIR Data principles in theory and in practice
The ‘FAIR Guiding Principles for scientific data management and stewardship’ [1] launched a global dialogue within research and policy communities and started a journey to wider accessibility and reusability of data and preparedness for automation-readiness (I am one of the army of authors). Over the past 5 years FAIR has become a movement, a mantra and a methodology for scientific research and increasingly in the commercial and public sector. FAIR is now part of NIH, European Commission and OECD policy. But just figuring out what the FAIR principles really mean and how we implement them has proved more challenging than one might have guessed. To quote the novelist Rick Riordan “Fairness does not mean everyone gets the same. Fairness means everyone gets what they need”.
As a data infrastructure wrangler I lead and participate in projects implementing forms of FAIR in pan-national European biomedical Research Infrastructures. We apply web-based industry-lead approaches like Schema.org; work with big pharma on specialised FAIRification pipelines for legacy data; promote FAIR by Design methodologies and platforms into the researcher lab; and expand the principles of FAIR beyond data to computational workflows and digital objects. Many use Linked Data approaches.
In this talk I’ll use some of these projects to shine some light on the FAIR movement. Spoiler alert: although there are technical issues, the greatest challenges are social. FAIR is a team sport. Knowledge Graphs play a role – not just as consumers of FAIR data but as active contributors. To paraphrase another novelist, “It is a truth universally acknowledged that a Knowledge Graph must be in want of FAIR data.”
[1] Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016). https://doi.org/10.1038/sdata.2016.18
Observability for Data Pipelines With OpenLineageDatabricks
Data is increasingly becoming core to many products. Whether to provide recommendations for users, getting insights on how they use the product, or using machine learning to improve the experience. This creates a critical need for reliable data operations and understanding how data is flowing through our systems. Data pipelines must be auditable, reliable, and run on time. This proves particularly difficult in a constantly changing, fast-paced environment.
Collecting this lineage metadata as data pipelines are running provides an understanding of dependencies between many teams consuming and producing data and how constant changes impact them. It is the underlying foundation that enables the many use cases related to data operations. The OpenLineage project is an API standardizing this metadata across the ecosystem, reducing complexity and duplicate work in collecting lineage information. It enables many projects, consumers of lineage in the ecosystem whether they focus on operations, governance or security.
Marquez is an open source project part of the LF AI & Data foundation which instruments data pipelines to collect lineage and metadata and enable those use cases. It implements the OpenLineage API and provides context by making visible dependencies across organizations and technologies as they change over time.
Data Warehouse or Data Lake, Which Do I Choose?DATAVERSITY
Today’s data-driven companies have a choice to make – where do we store our data? As the move to the cloud continues to be a driving factor, the choice becomes either the data warehouse (Snowflake et al) or the data lake (AWS S3 et al). There are pro’s and con’s for each approach. While the data warehouse will give you strong data management with analytics, they don’t do well with semi-structured and unstructured data with tightly coupled storage and compute, not to mention expensive vendor lock-in. On the other hand, data lakes allow you to store all kinds of data and are extremely affordable, but they’re only meant for storage and by themselves provide no direct value to an organization.
Enter the Open Data Lakehouse, the next evolution of the data stack that gives you the openness and flexibility of the data lake with the key aspects of the data warehouse like management and transaction support.
In this webinar, you’ll hear from Ali LeClerc who will discuss the data landscape and why many companies are moving to an open data lakehouse. Ali will share more perspective on how you should think about what fits best based on your use case and workloads, and how some real world customers are using Presto, a SQL query engine, to bring analytics to the data lakehouse.
Vertex AI - Unified ML Platform for the entire AI workflow on Google CloudMárton Kodok
Vertex AI is a managed ML platform for practitioners to accelerate experiments and deploy AI models.
Enhanced developer experience
- Build with the groundbreaking ML tools that power Google
- Approachable from the non-ML developer perspective (AutoML, managed models, training)
- Ease the life of a data scientist/ML (has feature store, managed datasets, endpoints, notebooks)
- Infrastructure management overhead have been almost completely eliminated
- Unified UI for the entire ML workflow
- End-to-end integration for data and AI with build pipelines that outperform and solve complex ML tasks
- Explainable AI and TensorBoard to visualize and track ML experiments
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies?Kai Wähner
Microservices became the new black in enterprise architectures. APIs provide functions to other applications or end users. Even if your architecture uses another pattern than microservices, like SOA (Service-Oriented Architecture) or Client-Server communication, APIs are used between the different applications and end users.
Apache Kafka plays a key role in modern microservice architectures to build open, scalable, flexible and decoupled real time applications. API Management complements Kafka by providing a way to implement and govern the full life cycle of the APIs.
This session explores how event streaming with Apache Kafka and API Management (including API Gateway and Service Mesh technologies) complement and compete with each other depending on the use case and point of view of the project team. The session concludes exploring the vision of event streaming APIs instead of RPC calls.
Understand how event streaming with Kafka and Confluent complements tools and frameworks such as Kong, Mulesoft, Apigee, Envoy, Istio, Linkerd, Software AG, TIBCO Mashery, IBM, Axway, etc.
A Streaming API Data Exchangeprovides streaming replication between business units and companies. API Management with REST/HTTP is not appropriate for streaming data.
What you need to know about Generative AI and Data Management?Denodo
Watch full webinar here: https://buff.ly/3UXy0A2
It should be no surprise that Generative AI will have a profound impact to data management in years to come. Much like other areas of the technology sector, the opportunities presented by GenAI will accelerate our efforts around all aspects of data management, including self-service, automation, data governance and security. On the other hand, it is also becoming clearer that to unleash the true potential of AI assistants powered by GenAI, we need novel implementation strategies and a reimagined data architecture. This presents an exhilarating yet challenging future, demanding innovative thinking and methodologies in data management.
Join us on this webinar to learn about:
- The opportunities and challenges presented by GenAI today.
- Exploiting GenAI to democratize data management.
- How to augment GenAI applications with corporate data and knowledge.
- How to get started.
This is a run-through at a 200 level of the Microsoft Azure Big Data Analytics for the Cloud data platform based on the Cortana Intelligence Suite offerings.
Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...Databricks
It is widely known that the discovery, development, and commercialization of new classes of drugs can take 10-15 years and greater than $5 billion in R&D investment only to see less than 5% of the drugs make it to market.
AstraZeneca is a global, innovation-driven biopharmaceutical business that focuses on the discovery, development, and commercialization of prescription medicines for some of the world’s most serious diseases. Our scientists have been able to improve our success rate over the past 5 years by moving to a data-driven approach (the “5R”) to help develop better drugs faster, choose the right treatment for a patient and run safer clinical trials.
However, our scientists are still unable to make these decisions with all of the available scientific information at their fingertips. Data is sparse across our company as well as external public databases, every new technology requires a different data processing pipeline and new data comes at an increasing pace. It is often repeated that a new scientific paper appears every 30 seconds, which makes it impossible for any individual expert to keep up-to-date with the pace of scientific discovery.
To help our scientists integrate all of this information and make targeted decisions, we have used Spark on Azure Databricks to build a knowledge graph of biological insights and facts. The graph powers a recommendation system which enables any AZ scientist to generate novel target hypotheses, for any disease, leveraging all of our data.
In this talk, I will describe the applications of our knowledge graph and focus on the Spark pipelines we built to quickly assemble and create projections of the graph from 100s of sources. I will also describe the NLP pipelines we have built – leveraging spacy, bioBERT or snorkel – to reliably extract meaningful relations between entities and add them to our knowledge graph.
DataOps is a methodology and culture shift that brings the successful combination of development and operations (DevOps) to data processing environments. It breaks down silos between developers, data scientists, and operators, resulting in lean data feature development processes with quick feedback. In this presentation, we will explain the methodology, and focus on practical aspects of DataOps.
Watch full webinar here: https://bit.ly/2Y0vudM
What is Data Virtualization and why do I care? In this webinar we intend to help you understand not only what Data Virtualization is but why it's a critical component of any organization's data fabric and how it fits. How data virtualization liberates and empowers your business users via data discovery, data wrangling to generation of reusable reporting objects and data services. Digital transformation demands that we empower all consumers of data within the organization, it also demands agility too. Data Virtualization gives you meaningful access to information that can be shared by a myriad of consumers.
Register to attend this session to learn:
- What is Data Virtualization?
- Why do I need Data Virtualization in my organization?
- How do I implement Data Virtualization in my enterprise?
This is the presentation for the talk I gave at JavaDay Kiev 2015. This is about an evolution of data processing systems from simple ones with single DWH to the complex approaches like Data Lake, Lambda Architecture and Pipeline architecture
What’s New with Databricks Machine LearningDatabricks
In this session, the Databricks product team provides a deeper dive into the machine learning announcements. Join us for a detailed demo that gives you insights into the latest innovations that simplify the ML lifecycle — from preparing data, discovering features, and training and managing models in production.
Drupal, CKAN and Public Data. DrupalGov 08 february 2016Steven De Costa
Main points from this presentation are:
DKAN is not CKAN
CKAN is owning Australian Government
Data.Vic, Data.NSW, Data.SA and Data.Brisbane use Drupal and CKAN together
Single Sign on – https://github.com/ckan/ckanext-drupal7
Taxonomies and CKAN - pulling from CKAN into Drupal to enhance content for Government websites.
Webforms to CKAN - for an 'open data' form collection process.
Resource Views for Drupal - configured for a CKAN portal and orgsanisation.
Telling stories with data...
Data Management Systems for Government Agencies - with CKANSteven De Costa
Over the last two days (5th and 6th of November 2015) I was very happy to present to a range of Victorian Government agencies and give them some context on what data management can do for their organisations.
From first principles we went through why data was important and what infrastructure was already in place via data.vic.gov.au for them to leverage. We covered examples of how other agencies, such as the Office of Environment and Heritage in NSW, are rebuilding their data management system to provide a more efficient pipeline for publishing internal and public data.
As always, I could not help highlighting the awesome leadership of WA Parks and Wildlife and the work done by Florian Mayer as the best case example for reducing the costs and friction often involved with publishing data as contextually marked up knowledge.
We covered a number of scenarios where the concept of resource containers for data were considered. This created valuable feedback which has further galvanized my thoughts about how to further extend CKAN to meet the needs of both private and open data portals, and other forms of realtime or unstructured data.
Getting to Know CKAN, 24 June 2015, SingaporeSteven De Costa
Presented in Singapore on 24 June 2015 as part of the Infocomm Development Authority Data 101 series.
Provides an overview on what CKAN is and what organisations are using it for. The session also covered a number of topics related to the organisation of published data.
Ckan foo - CKAN Association overview at CKANcon 2015, OttawaSteven De Costa
Presenting the statement of purpose for the CKAN Association, running through a SWOT analysis for the CKAN project and providing a short overview of Link Digital's activities.
Cloud Asia presentation in Singapore, 29 October 2015Steven De Costa
'The Perfect Storm' - covering service oriented Government, data classification and public cloud.
Presented as part of the Data as a Service track and hosted by iDA.
Presentation at the OGD2011 conference taking place in Vienna on the 16th of June 2011 as well as at the LOD2 CKAn workshop on 15th of June 2011: CKAN by Friedrich Lindenberg, Open Knowledge Foundation.
(License: CC-BY 3.0)
My slides from a talk at Forum öppna data at Swedish innovation agency Vinnova. Talk viewable at: http://www.youtube.com/watch?v=Bk85ynm4Lp0&list=UUsXswS5KOLMPl8cvLXwmhZw&index=9
Enabling re-use via CKAN: discoverability and interoperabilityIrina Bolychevsky
Talk at @OpenDataWeek in Marseille focused on how technology can power discoverability and interoperability and why they are important. Showcases CKAN's search and discovery functionality, harvesting abilities and data catalog interoperability protocol.
PublicData.eu is striving to become the Pan European one-stop-shop, providing access to open, freely reusable datasets from numerous local, regional and national public bodies across Europe.
After the first release of the PublicData.eu website (Alpha release was Jan 2011 & Beta release was June 2011) and it's subsequent upgrades (a significant upgrade was efected March 2012), OKFN worked towards the deployment of various personalization features, meant to improve the user experience on Publicdata.eu and spur more interest and interaction around the official data-sets.
The slides for the keynote talk I presented at http://2.encontro.dados.gov.br/encontro.html the 2nd National Open Data Meetup in Brazil.
Talking about open data, open government and how opening data in and of itself won't be a magic solution - we need open processes and an engaged civil society and media sector. Some steps and some challenges. Distinction between person data and open data, how to keep the internet open etc
This presentation talks about what Open Data is, how to get it and share for public use. This presentation is made possible by https://websiteghana.com and https://saviour-sanders.com.
Open Data Portals: 9 Solutions and How they CompareSafe Software
Get a comparison of CKAN, Socrata, ArcGIS Open Data and other top open data solutions. Plus get answers to best practice questions such as: Which datasets are important to share? What are the approximate costs? Which file formats should the data be shared in? How often should the data get updated? And overall, how can we ensure success with our open data portal?
On November 21st 2014 at the Tufts University Medford campus and November 25th 2014 at the campus of the University of Massachusetts Medical School in Worcester, the BLC and Digital Science hosted a workshop focused on better understanding the research information management landscape.
Mark Hahnel, CEO of Figshare discussed more specific aspects of the research data management landscape and various approaches to address the growing suite of mandates.
In 2018, the SciELO Program will celebrate 20 years of operation, in full alignment with the advances of open science.
The SciELO 20 Years Conference will address and debate – during its three-day program – the main political, methodological and technological issues that define today’s state of the art in scholarly communication and the trends and innovations that is shaping the future of the universal openness of scholarly publishing and its relationship with today’s Open Access journals, in particular those of the SciELO Network.
The program of the conference is organized around the alignment of SciELO journals and operations with the best practices on communication of open science, such as publishing research data, expediting editorial processes and communication through the continuous publication of articles and the adoption of preprints, maximizing the transparency of research evaluation and the flow of scholarly communication, and searching for more comprehensive systems for assessing research, articles and journals.
A two-day meeting of the coordinators of the national collections of the SciELO Network will take place prior to the Conference with focus on the evaluation of SciELO journals and the SciELO Program and their improvement following the lines of action that will guide their development in the forthcoming five years.
The celebration of SciELO’s 20-year anniversary constitutes an important landmark in SciELO’s evolution, and an exceptional moment to promote the advancement of an inclusive, global approach to scholarly communication and to the open access movement while respecting the diversities of thematic and geographic areas, as well as of languages of scientific research.
OPEN KNOWLEDGE PLATFORM USE-CASES - TugaIT 2018Pedro Sousa
Many of Open Knowledge International’s projects are technical in nature. Its most prominent project, CKAN, is used by many of the world’s governments to host open catalogues of data that their countries possess.
CKAN is a tool for making open data websites. (Think of a content management system like WordPress – but for data, instead of pages and blog posts.) It helps you manage and publish collections of data. It is used by national and local governments, research institutions, and other organizations who collect a lot of data.
In this talk I’ll go over some use-cases of Open Knowledge Platform implementations by the Portuguese Government, the architectural features, the difficulties and different approaches to solve them.
Presented by Michael Victor, Abenet Yabowork, Jane Poole, Harrison Njamba, Erick Rutto and Peter Ballantyne at the ILRI open access week workshop, ILRI, Nairobi, 23-25 October 2019
Breakout Session: When Open and Accessible is a Good Thing - Innovative Code ...Code Communications
2015 Code Industry Summit - Breakout Session: When Open and Accessible is a Good Thing - Innovative Code Enforcement with Open Data and Web Maps
Austin Code Department will discuss the strategies and benefits of sharing Code enforcement data online using GIS. The presentation will highlight Austin Code’s recent successes in making enforcement practices more transparent and accessible, as well as showcase technologies to help you launch your own enforcement maps quickly and easily.
Speakers: Terri Roberts (Division Manager, Austin Code) and Paul Frank (Principal Planner, Development Services, City of Austin)
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Introduction to CKAN
1. CKAN: the open source
software powering open
data portals
Open Knowledge Foundation
Irina Bolychevsky (@shevski)
info@ckan.org
ckan.org
okfn.org
2. Agenda
1. Open Knowledge Foundation
2. Context
3. CKAN overview
4. Discoverability and Data Management
5. Geospatial, Federation, Multilingualism
6. Roadmap
7. Questions
3. The Open Knowledge Foundation
We are a global movement to open up
knowledge around the world and see it used
and useful.
okfn.org
4. We build tools and communities to educate,
empower and connect people.
OpenDataProtocols.org
5. What does open mean?
A piece of content or data is open if anyone is free to
use, reuse, and redistribute it — subject only, at most,
to the requirement to attribute and share-alike.
http://opendefinition.org/
9. What problems are we trying to solve?
To utilize digital information more effectively to improve
governance, the economy and research
Context:
● Explosion of digital information
● Ever improving information technology
For example:
● Find a better way to get to work
● Build more sustainable cities
● Spend government money more effectively and
legitimately
10. Open Solution
Step 1: get the data openly licensed
Step 2: make it accessible - metadata,
formats, portal (ckan.org)
Step 3: start building, linking and turning data
into something more - information
11. CKAN - Quick Fact Sheet
Open Source - all code on GitHub
Extensible flexible componentized architecture
Developer friendly
Rich RESTful JSON API
12. Serves two main use cases
1. search and discoverability for re-users of
data
2. data management tools for publishers
13. 1. Search and discovery
Online home for data
Central keyword search
Facet by tags, location, format, licence,
publishing department
Browse by groups, keywords, publishers
Standardized interface for viewing
Link to datasets or data directly
Previews and data exploration where possible
18. Visualisation & data explorer functionality:
● Previews for many types of data
● Data API for .csv & tabular data
● Linkable URIs and ability to embed visualisations
20. Data Management for Publishers
Easily store and update metadata records
Workflow and approval
Fine grained authorization controls
Broken link reports
Download and view counts
29. Geo-Search: Filter by location /
draw bounding box
WMS previews
Plotting GeoJSON / Longitude &
Latitude in tabular data
Support for:
● INSPIRE
● GEMINI 2.1
● CSW
● ISO 19139
30. Harvesting and normalization
Get metadata from external catalogs and
endpoints
CKAN will parse, validate and normalise to
create metadata records that look the same to
end users no matter where they came from
We can currently harvest: other CKAN
catalogs, CSW endpoints and WAFs serving
ISO 19139 documents
31.
32.
33. Federation
Search across catalogs in aggregator sites
(such as publicdata.eu)
Data Catalog Interoperability Protocol: http:
//spec.datacatalogs.org/
35. Multilingualism
Translated by into over 18 languages
https://www.transifex.com/projects/p/ckan/language/sv/
Fully supports all international characters
Added multilingual search, setting dataset
language level, string translations using a
vocabulary & more for the European
Commission Open Data Portal
37. Open Source
All our code is on Github: https://github.
com/okfn/ckan
Open issue tracker
Code contributions: https://github.
com/okfn/ckan/blob/master/CONTRIBUTING.
rst