Presentation by Al Hamilton and Cody Johnson to Canberra Semantic Web Meetup Group on why producers of official statistics are interested in semantic web community (including Linked Open Data) and outlining experimental work by Cody Johnson on transforming selected Population Census data released by the ABS in SDMX-ML to RDF Data Cube Vocabulary format.
BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...Big Data Value Association
In the Internet of Everything, huge volumes of multimedia data are generated at very high rates by heterogeneous sources in various formats, such as sensors readings, process logs, structured data from RDBMS, etc. The need of the hour is setting up efficient data pipelines that can compute advanced analytics models on data and use results to customize services, predict future needs or detect anomalies. This Webinar explores the TOREADOR conversational, service-based approach to the easy design of efficient and reusable analytics pipelines to be automatically deployed on a variety of cloud-based execution platforms.
BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...Big Data Value Association
In the Internet of Everything, huge volumes of multimedia data are generated at very high rates by heterogeneous sources in various formats, such as sensors readings, process logs, structured data from RDBMS, etc. The need of the hour is setting up efficient data pipelines that can compute advanced analytics models on data and use results to customize services, predict future needs or detect anomalies. This Webinar explores the TOREADOR conversational, service-based approach to the easy design of efficient and reusable analytics pipelines to be automatically deployed on a variety of cloud-based execution platforms.
Service innovation: the hidden value of open dataSlim Turki, Dr.
> Presented at the Share-PSI Krems Workshop: A self sustaining business model for open data
- http://www.w3.org/2013/share-psi/workshop/krems/papers/ServiceInnovation-theHiddenValueOfOpenData
- http://www.w3.org/2013/share-psi/workshop/krems/
> Summary
The development of a data driven economy has been a major orientation of economic policies over the past few years based on (i) the wider availability of data promoted in particular by the Open Data movement and (ii) the development of dedicated tools to support heterogeneous data and data in large quantities (Big data). Reports anticipate the creation of enormous amounts of economic activity and growth opportunities. However the promise of the data-driven economy lies to a large extent in the development of new services. The return on investment of open data policies for instance should be evaluated from the services created based on open data sets. Open data promoters couple more and more open data initiatives with actions dedicated to the promotion of the datasets for the creation of new services. Nevertheless the results in terms of services created remain below the expectations of open data promoters. Indeed most services created are not sustainable and / or do not use the variety of datasets. They are to a wide extent relying on a limited number of very popular datasets. In order to make the promise of the data-driven economy a reality, it is therefore necessary to increase reuse and value extracted by services from data. Our hypothesis is that service innovation approaches can help understand the mechanisms that drive the creation of services. We therefore propose to analyse the roles that the data can have in the design of services based on a theoretical framework of service innovation.
The data-driven economy promises the creation of enormous amounts of economic activity and growth opportunities. However these projections lie to a large extent in the development of new services. Currently, the results in terms of service creation remain below the expectations of open data promoters. Indeed most services created are not sustainable and / or do not use the variety of datasets. They are to a wide extent relying on a limited number of very popular datasets. To increase the reuse and the value extracted by services from data, our hypothesis is that service innovation approaches can help understand the mechanisms that drive the creation of services. We therefore propose a review the current approaches to encouraging the creation of services based on data, an analysis of the creation of services from two open data platforms, in the UK and in Singapore, and a description of the roles that the data can have in the design of services based on a theoretical framework of service innovation.
Muriel Foulonneau 1, Slim Turki 1, Géradine Vidou 1, Sébastien Martin 2
1 Public Research Centre Henri Tudor, Luxembourg-Kirchberg, Kirchberg
2 Université Paris 8, Vincennes-Saint-Denis, France
muriel.foulonneau@tudor.lu
slim.turki@tudor.lu
geraldine.vidou@tudor.lu
Proceedings of 14th European Conference on eGovernment – ECEG 2014
12-13 June 2014
Brasov, Romania
presented at the 2011 SemTech
Open government data and related services/applications are quickly growing on the Web. Although most agree that the government data has great potential in solving real world problems, there are still many challenges that must be addressed. This talk will describe several representative domain applications and provide concrete examples of evolving technical challenges remaining. We will show solution paths that have proven useful and make recommendations on the corresponding Semantic Web best practices.
• Scalability. How can we handle(e.g. search and cleanse) the 3,000+ raw/tool datasets, and the additional 300,000+ geo datasets from data.gov?
• Interoperability. Multi-scale open government data came from city governments, state governments, and national governments. How can one compare the GDP of the US and China, and later link to state-level financial data? Open government data covers many domains. How can one associate open government data with domain knowledge to build a cancer prevention application?
• Provenance and quality. How should provenance be leveraged to facilitate high-quality data management interactions (e.g. reuse, mash-up and feedback) between the government and the public?
Tableau’s predictive modeling feature allows users to leverage powerful statistical models to build and update predictive models efficiently while giving them the flexibility to select their predictors, collaborate on the model results within other table calculations, and comprehend and examine a large volume of data. Go through this presentation to discover how Tableau’s predictive modeling feature allows users to leverage powerful statistical models to build and update predictive models efficiently.
This is a Powerpoint Presentation based on the comparison of various available analytical tools. This includes various tools for business analytics and their detailed description.
This presentation contains a broad introduction to big data and its technologies.
Big data is a term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis.
Big Data is a phrase used to mean a massive volume of both structured and unstructured data that is so large it is difficult to process using traditional database and software techniques. In most enterprise scenarios the volume of data is too big or it moves too fast or it exceeds current processing capacity.
Service innovation: the hidden value of open dataSlim Turki, Dr.
> Presented at the Share-PSI Krems Workshop: A self sustaining business model for open data
- http://www.w3.org/2013/share-psi/workshop/krems/papers/ServiceInnovation-theHiddenValueOfOpenData
- http://www.w3.org/2013/share-psi/workshop/krems/
> Summary
The development of a data driven economy has been a major orientation of economic policies over the past few years based on (i) the wider availability of data promoted in particular by the Open Data movement and (ii) the development of dedicated tools to support heterogeneous data and data in large quantities (Big data). Reports anticipate the creation of enormous amounts of economic activity and growth opportunities. However the promise of the data-driven economy lies to a large extent in the development of new services. The return on investment of open data policies for instance should be evaluated from the services created based on open data sets. Open data promoters couple more and more open data initiatives with actions dedicated to the promotion of the datasets for the creation of new services. Nevertheless the results in terms of services created remain below the expectations of open data promoters. Indeed most services created are not sustainable and / or do not use the variety of datasets. They are to a wide extent relying on a limited number of very popular datasets. In order to make the promise of the data-driven economy a reality, it is therefore necessary to increase reuse and value extracted by services from data. Our hypothesis is that service innovation approaches can help understand the mechanisms that drive the creation of services. We therefore propose to analyse the roles that the data can have in the design of services based on a theoretical framework of service innovation.
The data-driven economy promises the creation of enormous amounts of economic activity and growth opportunities. However these projections lie to a large extent in the development of new services. Currently, the results in terms of service creation remain below the expectations of open data promoters. Indeed most services created are not sustainable and / or do not use the variety of datasets. They are to a wide extent relying on a limited number of very popular datasets. To increase the reuse and the value extracted by services from data, our hypothesis is that service innovation approaches can help understand the mechanisms that drive the creation of services. We therefore propose a review the current approaches to encouraging the creation of services based on data, an analysis of the creation of services from two open data platforms, in the UK and in Singapore, and a description of the roles that the data can have in the design of services based on a theoretical framework of service innovation.
Muriel Foulonneau 1, Slim Turki 1, Géradine Vidou 1, Sébastien Martin 2
1 Public Research Centre Henri Tudor, Luxembourg-Kirchberg, Kirchberg
2 Université Paris 8, Vincennes-Saint-Denis, France
muriel.foulonneau@tudor.lu
slim.turki@tudor.lu
geraldine.vidou@tudor.lu
Proceedings of 14th European Conference on eGovernment – ECEG 2014
12-13 June 2014
Brasov, Romania
presented at the 2011 SemTech
Open government data and related services/applications are quickly growing on the Web. Although most agree that the government data has great potential in solving real world problems, there are still many challenges that must be addressed. This talk will describe several representative domain applications and provide concrete examples of evolving technical challenges remaining. We will show solution paths that have proven useful and make recommendations on the corresponding Semantic Web best practices.
• Scalability. How can we handle(e.g. search and cleanse) the 3,000+ raw/tool datasets, and the additional 300,000+ geo datasets from data.gov?
• Interoperability. Multi-scale open government data came from city governments, state governments, and national governments. How can one compare the GDP of the US and China, and later link to state-level financial data? Open government data covers many domains. How can one associate open government data with domain knowledge to build a cancer prevention application?
• Provenance and quality. How should provenance be leveraged to facilitate high-quality data management interactions (e.g. reuse, mash-up and feedback) between the government and the public?
Tableau’s predictive modeling feature allows users to leverage powerful statistical models to build and update predictive models efficiently while giving them the flexibility to select their predictors, collaborate on the model results within other table calculations, and comprehend and examine a large volume of data. Go through this presentation to discover how Tableau’s predictive modeling feature allows users to leverage powerful statistical models to build and update predictive models efficiently.
This is a Powerpoint Presentation based on the comparison of various available analytical tools. This includes various tools for business analytics and their detailed description.
This presentation contains a broad introduction to big data and its technologies.
Big data is a term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis.
Big Data is a phrase used to mean a massive volume of both structured and unstructured data that is so large it is difficult to process using traditional database and software techniques. In most enterprise scenarios the volume of data is too big or it moves too fast or it exceeds current processing capacity.
Orden HAP/467/2015 de 13 de marzo por la que se aprueban los modelos de Rent...José Manuel Arroyo Quero
Orden HAP/467/2015, de 13 de marzo, por la que se aprueban los modelos de declaración del Impuesto sobre la Renta de las Personas Físicas y del Impuesto sobre el Patrimonio, ejercicio 2014, se determinan el lugar, forma y
plazos de presentación de los mismos, se establecen los procedimientos de obtención o puesta a disposición, modificación y confirmación del borrador de
declaración del Impuesto sobre la Renta de las Personas Físicas, y se determinan las condiciones generales y el procedimiento para la presentación de ambos por medios telemáticos o telefónicos.
Basic Mandarin Chinese | Lesson 6 | Introducing yourself & meeting new friendsCultureAlley
Learn Mandarin Chinese for free using self-paced audio-visual lessons and interactive practice exercises - CultureAlley - master conversations, grammar, vocabulary and more! This lessons covers conversations around introducing oneself, meeting new people and friends. It will also cover telling one's name.To study this at your own pace, take quizzes and explore more lessons go to www.culturealley.com. See you at the Alley!
Como transformar idéias visionárias em modelo de negocio?Como questionar, desafiar e transformar modelos
velhos e ultrapassados? O Business Model Canvas é um ferramenta essencial na busca dessas respostas.
Bienvenido a la republica independiente de las pruebas unitarias con Core DataAlfonso Alba
Presentación para el NSCoder Night Madrid del 6 de marzo de 2013 por Jorge Ortiz y yo mismo. En esta presentación hablado de cómo se pueden hacer pruebas unitarias a un modelo en Core Data y cómo conseguir desacoplar este modelo de la forma en que hemos elegido que persista.
Scanner Data
In these slides the author presents the issues and challenges related to dealing with datasets of big size such as those involved in the Scanner Data project at Istat. He illustrates IT architecture backing the testing phase of the project, currently in place, and the ideas for the production architecture. The motivations behind the design are explained as well as the solutions introduced as part of a larger scope approach to the modernization of tools and techniques used for data storage and processing in Istat, envisioning the future challenges posed by the adoption of Big Data and Data Science in NSIs.
http://www.istat.it/en/archive/168897
http://www.istat.it/it/archivio/168890
Creating Effective Data Visualizations in Excel 2016: Some BasicsShalin Hai-Jew
One of the mainstays of a modern software toolkit is Excel 2016, from Microsoft Office 2016. By reputation, Excel is considered a beginner’s tool that self-respecting data analysts would bypass, but Excel is fairly high-powered, can take up to 1.06 million rows of data per set, contains complex statistical analysis capabilities (without the need for scripting), and enables rich data visualizations. It has a number of rich add-ons to empower different analytical and data visualization functionalities. It works as a great bridging tool to more complex types of statistical analyses.
This session walks participants through some basic built-in data visualizations in Excel 2016, including pie charts and doughnuts, bar charts, tree maps and sunburst diagrams, cluster diagrams, spider (radar) charts, scattergraphs, and others. This session will cover how data structures and desired emphases will determine the options for particular data visualizations.
In this session, participants will
review how to load a data table,
read the general data in a data table (or worksheet),
process or clean the data as needed,
use the Recommended Charts feature,
decide which built-in data visualizations to use, and
consider how to add relevant data visualization elements (including data labels, background grids, axis labels, and titles) for a coherent and effective data visualization.
Also, participants will help co-build data visualizations from open-source and other datasets.
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Geoffrey Fox
Keynote at Sixth International Workshop on Cloud Data Management CloudDB 2014 Chicago March 31 2014.
Abstract: We introduce the NIST collection of 51 use cases and describe their scope over industry, government and research areas. We look at their structure from several points of view or facets covering problem architecture, analytics kernels, micro-system usage such as flops/bytes, application class (GIS, expectation maximization) and very importantly data source.
We then propose that in many cases it is wise to combine the well known commodity best practice (often Apache) Big Data Stack (with ~120 software subsystems) with high performance computing technologies.
We describe this and give early results based on clustering running with different paradigms.
We identify key layers where HPC Apache integration is particularly important: File systems, Cluster resource management, File and object data management, Inter process and thread communication, Analytics libraries, Workflow and Monitoring.
See
[1] A Tale of Two Data-Intensive Paradigms: Applications, Abstractions, and Architectures, Shantenu Jha, Judy Qiu, Andre Luckow, Pradeep Mantha and Geoffrey Fox, accepted in IEEE BigData 2014, available at: http://arxiv.org/abs/1403.1528
[2] High Performance High Functionality Big Data Software Stack, G Fox, J Qiu and S Jha, in Big Data and Extreme-scale Computing (BDEC), 2014. Fukuoka, Japan. http://grids.ucs.indiana.edu/ptliupages/publications/HPCandApacheBigDataFinal.pdf
Presentation on Data Mesh: The paradigm shift is a new type of eco-system architecture, which is a shift left towards a modern distributed architecture in which it allows domain-specific data and views “data-as-a-product,” enabling each domain to handle its own data pipelines.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
2. Outline
I. Context
– Transforming national & international statistical
systems
– Semantic Web / Linked Data meets Official Statistics
– SemStats 2013
– Parameters for the R&D project
II. Investigation of existing tools
III. Summary of the transformation process
IV. Lessons learned
V. Discussion
3. 2009 (Australia)
• The case for an international statistical innovation program
Transforming national and international statistics systems
• Future capabilities
1. From static data products to “common information services”
2. From publications to communication
3. Support for transaction data flowing at a much higher volume
4. Ability to rapidly incorporate new issues and views of data into
standards and classifications
5. ‘Rapid-response’ capability
6. Connecting processes and passing metadata and data easily
between them
7. Analysing assemblies of data
4. The Challenges
Increasing
cost &
difficulty of
acquiring
survey data
New sources
& changing
expectations
Rapid changes
in the
environment
Competition
for skilled
resourcesDiminishing
budgets
Riding the
big data
wave
5. HLG
• High-Level Group for the Modernisation of Statistical Production and Services
• Comprises 10 heads of national and international statistical organisations
– Gosse van der Veen (Netherlands) - Chairman
– Brian Pink (Australia)
– Eduardo Sojo Garza-Aldape (Mexico)
– Enrico Giovannini (Italy)
– Woo, Ki-Jong (Republic of Korea)
– Irena Križman (Slovenia)
– Katherine Wallman (United States)
– Walter Radermacher (Eurostat)
– Martine Durand (OECD)
– Lidia Bratanova (UNECE)
The official statistics industry
and its place in the wider
information industry
From Strategy to implement the vision of
the HLG (2012)
6. Grouping the challenges
1. Product Challenge - Modernising Statistical Services
• Designing and delivering new and better statistical
outputs (products and services)
2. Process Challenge – Modernising Statistical
Production
• Developing and implementing new and better production
processes and methods which are capable of delivering
statistical outputs with
i. reduced cost, and
ii. greater flexibility.
7. HLG Strategy
• Standards-based, collaborative modernisation of official statistics.
• Create an environment (eg “common architecture”) that facilitates
collaborative development, sharing and reuse of
– statistical business processes
– statistical methods
– IT components
– data repositories
• Explicit role for
– common conceptual frameworks, eg
• GSIM (Generic Statistical Information Model)
– and common implementation standards, eg
• SDMX (Statistical Data and Metadata eXchange), working with
• DDI (Data Documentation Initiative)
8. ABS main data service support SDMX
• ABS.Stat Beta
– Dissemination from predefined aggregate data cubes
• eg Consumer Price Index
– Featured at GovHack 2013
– Based on OECD.Stat
• Now used by OECD, IMF, UNESCO, European Commission, ABS,
Statistics New Zealand, Statistics Italy
• Further development through SIS Collaboration Community
• TableBuilder
– Dissemination of on demand tabulations from microdata
• Includes Population Census
9. Harnessing the opportunities
• Global community around SDMX
– intersects with SIS Collaboration Community
• Working on
– SDMX to JSON (JavaScript Object Notation)
• Making life easier for third party developers
– No need to parse SDMX-ML
• Object model similar to Data Cube Vocabulary (DCV)
• Expected to be released for review in September
– SDMX to Data Cube Vocabulary (DCV)
• Much earlier stage within SIS Collaboration Community
10. Layering standards on standards
• RDF Data Cube Vocabulary (DCV) developed
under W3C
– designed for publishing multi-dimensional data, such
as statistics, on the web in such a way that it can be
linked to related data sets and concepts
– based upon the approach used by the SDMX ISO
standard for statistical data exchange
– very general and can be used for other data sets such
as survey data, spreadsheets and OLAP data cubes
11. Use of DCV
• Usage within
– data.gov.uk
– Eurostat
– Other institutions within the European Union via
the EU’s Open Data Portal
• eg European Environment Agency
– Experimental use within data.gov.au
12. Linked Data view on Official Statistics
• Official Statistics and the Practice of Data Fidelity
– Official statistics are the “crown jewels” of a nation’s public data
– Provide empirical evidence for policy making and economic research
– Statistical offices are among the most “data-savvy” organisations in
government
– Handling of Statistical Data as Linked Data requires particular attention
to maintain its integrity and fidelity
• Linked SDMX Data
– Challenges
• Automation of data transformation of data from high profile statistical
organizations
• Minimization of third-party interpretation of the source data and metadata and
lossless transformations
13. (Unofficial) view from Official Statistics
• Semantic Statistics opportunities include :
– external application of statistical classifications, and other statistical
concept schemes, as ontologies
– simpler, more flexible and more powerful use of statistical data along side
other data
– partnering more closely with other “data” communities
• Semantic Statistics issues and risks include
– ensuring production process is sustainable
– ensuring semantics are identified consistently across all statistical outputs
from a single agency
– possible lack of rigour when defining and linking concepts to outputs from
other sources
– the possibility of “fuzzy” semantics leading to incorrect data analyses
14. SemStats 2013
• Interest in “Semantic Statistics” is growing rapidly
within Statistical and Semantic Web communities
• There are existing semantic web developments
building on both SDMX and DDI
• SemStats 2013 provides a rare opportunity to interact
with world experts while they’re in Australia
• We are interested in what entrants might create and
demonstrate in regard to SemStats 2013 Challenge
15. SemStats 2013 Challenge
• Provides Australian and French Census data in
Data Cube Vocabulary (DCV) format
– Data is Geography x Sex x Age x “Activity” status
– Entrants are asked to demonstrate value from innovative
application of semantic web technologies to the data.
16. Aim when preparing Australian content
• use as an opportunity for practical learning
• start with SDMX-ML (not, eg, CSV) (if possible)
– Plan A: SDMX-ML from TableBuilder
• use existing international tools for SDMX-ML to
DCV transformations (if possible)
• do the work within the ABS (if possible)
• Plan B was to ask INSEE (Statistics France) to help us with the
transformation
17. Investigation
• Datalift
– Supports multiple input types
– Generic transformation
– Supports dissemination to the web
• Mimas
– XSLT based
– Complicated
• Guillaume report
– From INSEE
– Highly tailored to the input data
18. Datalift
• Free to use – source code also available
• Java web application
• Supports multiple input types
– Semantic graphs
– Relational databases
– Files (CSV, XML, etc)
• Supports entire cycle
– INSEE plan to use in future
• SDMX -> DCV plug-in in development
19. Mimas
• Inflexible
– XML input only
– XML output only
• Cumbersome
– Requires multiple intermediate conversions
• Inefficient for large volumes of data
20. Guillaume Report
• INSEE short term solution
• Datalift was not mature enough
• MIMAS identified as cumbersome and
inefficient
• Opted to use Apache Jena for small Java
application
21. Technology Overview
• Census TableBuilder
– Data extracted in SDMX and CSV
• Java
– Apache Jena library
– SDMX 2.0 XML beans
• Ontologies used
– Simple Knowledge Organisation System
– Data Cube Vocabulary
• Turtle RDF syntax
– Easy to read for humans and machines
22. SDMX Extraction Tool Overview
• Reads in SDMX structure file
– Uses SDMX 2.0 beans to parse file
• Disassembles XML to main components
– Code lists
– Concepts
– Key Families
• Build semantic model with Apache Jena
• Write to file in Turtle syntax
29. Data Structure Definition
Can only be values of
this type
List of codes to use
Concept dimension is
measuring
What the observation
is measuring
30. The Data - SDMX
• Series key – dimensions being measured
• Attributes – extra metadata about observation
• Obs – the value of the observation (i.e. people
counted)
31. The Data - DCV
• More condensed – attributes attached to the
dataset instead of the observation
Dimensions
Coded values
Observation
value
Dataset
observation is
from
32. Lessons Learned (1)
• Subject Matter Experts needed
– What dimensions to use?
– What attributes to use?
– What concepts are we measuring?
• Current tools not yet mature
• Full validation of data complex
• Heavy resource usage for large data
– Unable to process SA2 level data on 32bit
33. Lessons Learned (2)
• Conversion straight forward
– Standards very similar
• Promotes reuse
– Power comes from linking data
• Linked nature makes you think about what
you are doing
– E.g. How close is INSEE activity to ABS labour force
status?
34. Semantic Considerations
• How much, how soon, do we aim to harness opportunities
for carrying more usable semantics in Data Cube
Vocabulary?
– Expected an external ontology for sex – but most are for Gender
• How close is “close enough” for semantic assertions in Linked Open
Data?
• Aim for statistical harmonisation first (eg SDMX Cross Domain
Concepts) then explore links to broader ontologies?
• Even data producers are not sure if Age is a common
concept across ABS & INSEE (Statistics France).
• Risk of overselling the technical format before semantic
payload is sorted?
35. Laying the foundations
• The project confirmed that, in order to deliver more useable semantics in
our outputs, on a sustainable basis, we need statistical data and metadata
to be defined and managed on a consistent, standards aligned basis across
the organisation, including
– across all statistical subject matter domains (social, economic, environmental)
– “end to end” (ie spanning design, collection, processing/integration, analysis
and dissemination)
• We also need production processes to be automated & sustainable.
• This is one example of why ABS needs to “modernise statistical
production” to reflect the changed world in which we operate and to offer
new services that address new needs and expectations of users.
• In the 13/14 Budget Papers funding of $2.1 million was provided to
develop a second pass business case for a major statistical infrastructure
and business process reengineering project.
National Statistical Institutions face shared constraints and challenges.External ChallengesRapidly changing external environment - 24 / 7 access to informationIncreasing demand by sophisticated users for more timely, relevant statistical data to meet ‘current’ day issuesincreasing demand for more accessible and ‘joined up’ data to solve complex policy questionsConstraintsReduced funding and volatility in funding Our costs are increasing significantly – unable to contact many households, response rates are dropping, it is becoming more and more difficult to recruit and retain interviewers skills shortages – competing for statistical and ICT skills across government complex work programs siloed processesand aging infrastructure