The document discusses a pilot project within the Big Data Europe initiative that aims to integrate citizen budget data from multiple municipalities. The pilot will develop a platform to aggregate budget and spending data from different sources and formats to allow for analysis and visualization. Technical components like Apache Flume, Kafka, Spark and HDFS will be used to ingest, store and analyze the data. A semantic layer will consolidate the data and link it. The pilot aims to evaluate the platform with municipalities and receive feedback on analyzing a growing amount of integrated budget data over time.
Big Data Europe is a EU funded Horizon2020 project and will undertake the foundational work for enabling European companies to build innovative multilingual products and services based on semantically interoperable, large-scale, multi-lingual data assets and knowledge, available under a variety of licenses and business models.
The Open PHACTS Discovery Platform is bringing together pharmacological data resources in an integrated, interoperable infrastructure, and has been developed to reduce barriers to drug discovery in industry, academia and for small businesses.
The first round of pilots for the Big Data Europe project is about to enter the evaluation phase. This also holds for the Societal Challenge 1: Health. For this challenge the Open PHACTS foundation, Manchester University and the VU Amsterdam are working on the Open PHACTS docker and its integration with the Big Data Europe infrastructure.
This presentation will give you:
- a general overview of the infrastructure and the status of the generic components that are being developed
- an outline of the Societal Challenge and the rationale for the pilot
a look into the future pilot options
The intended audience are people acquainted with basic development tools like Docker and GitHub with an interest in Big Data and Drug Discovery.
BDE-SC6 Hangout - “Insight into Virtual Currency Ecosystems”BigData_Europe
Third SC6 webinar was held on 16 February 2017. It was organised by the Consortium of Social Science Data Archives (CESSDA) from Norway and the Semantic Web Company (SWC) from Austria. Theme of the webinar was “Insight into Virtual Currency Ecosystems” presented by Dr. Bernhard Haslhofer, Data Scientist at the Austrian Institute of Technology.
BDE-BDVA Webinar: BigDataEurope Overview & Synergies with BDVABigData_Europe
Short outline of the project's mission and current status & summary of the identified synergies between BDVA and the project, included those at a technical level.
Big Data Europe at eHealth Week 2017: Linking Big Data in HealthBigData_Europe
Of the four V's of big data – Volume, Velocity, Variety and Veracity – the most challenging for the health sector is Variety. Health data comes from many sources, formats and standards – how can we bring these together to reap the benefits of big data technologies?
Big Data Europe is tackling this challenge head-on, building a big data infrastructure flexible enough to tackle all seven Societal Challenges identified by Horizon 2020. Here we demonstrate our pilot implementation of Open PHACTS, which integrates life science data for drug discovery.
12 May 2017
Hajira Jabeen introduces the Big Data Europe Integrator Platform. The deck also includes the slides use to summarise the other presentations in the launch webinar.
BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data EuropeBigData_Europe
Watch this webinar on YouTube: https://youtu.be/MwG0yhrctDs
Slides for the latest update on our Big Data Europe pilot in Societal Challenge 1: Health, Demographic Change and Wellbeing.
Last year we successfully completed the first phase of this pilot, replicating the functionality of the Open PHACTS Discovery Platform on the BDE infrastructure. The Open PHACTS Discovery Platform brings together pharmacological data resources in an integrated, interoperable infrastructure, and has been developed to reduce barriers to drug discovery for industry, academia, and small businesses.
Learn more about the progress we’ve made, and what’s coming next.
1. General overview of the Big Data Europe project and Societal Challenges it addresses (Ronald Siebes, VU Amsterdam)
2. The Big Data Europe infrastructure, generic components that are being developed, and their flexibility for different applications (Hajira Jabeen, University of Bonn)
3. Latest details of the current state of the Open PHACTS architecture in BDE, and ongoing work (Nick Lynch, CTO, Open PHACTS Foundation)
Big Data Europe is a EU funded Horizon2020 project and will undertake the foundational work for enabling European companies to build innovative multilingual products and services based on semantically interoperable, large-scale, multi-lingual data assets and knowledge, available under a variety of licenses and business models.
The Open PHACTS Discovery Platform is bringing together pharmacological data resources in an integrated, interoperable infrastructure, and has been developed to reduce barriers to drug discovery in industry, academia and for small businesses.
The first round of pilots for the Big Data Europe project is about to enter the evaluation phase. This also holds for the Societal Challenge 1: Health. For this challenge the Open PHACTS foundation, Manchester University and the VU Amsterdam are working on the Open PHACTS docker and its integration with the Big Data Europe infrastructure.
This presentation will give you:
- a general overview of the infrastructure and the status of the generic components that are being developed
- an outline of the Societal Challenge and the rationale for the pilot
a look into the future pilot options
The intended audience are people acquainted with basic development tools like Docker and GitHub with an interest in Big Data and Drug Discovery.
BDE-SC6 Hangout - “Insight into Virtual Currency Ecosystems”BigData_Europe
Third SC6 webinar was held on 16 February 2017. It was organised by the Consortium of Social Science Data Archives (CESSDA) from Norway and the Semantic Web Company (SWC) from Austria. Theme of the webinar was “Insight into Virtual Currency Ecosystems” presented by Dr. Bernhard Haslhofer, Data Scientist at the Austrian Institute of Technology.
BDE-BDVA Webinar: BigDataEurope Overview & Synergies with BDVABigData_Europe
Short outline of the project's mission and current status & summary of the identified synergies between BDVA and the project, included those at a technical level.
Big Data Europe at eHealth Week 2017: Linking Big Data in HealthBigData_Europe
Of the four V's of big data – Volume, Velocity, Variety and Veracity – the most challenging for the health sector is Variety. Health data comes from many sources, formats and standards – how can we bring these together to reap the benefits of big data technologies?
Big Data Europe is tackling this challenge head-on, building a big data infrastructure flexible enough to tackle all seven Societal Challenges identified by Horizon 2020. Here we demonstrate our pilot implementation of Open PHACTS, which integrates life science data for drug discovery.
12 May 2017
Hajira Jabeen introduces the Big Data Europe Integrator Platform. The deck also includes the slides use to summarise the other presentations in the launch webinar.
BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data EuropeBigData_Europe
Watch this webinar on YouTube: https://youtu.be/MwG0yhrctDs
Slides for the latest update on our Big Data Europe pilot in Societal Challenge 1: Health, Demographic Change and Wellbeing.
Last year we successfully completed the first phase of this pilot, replicating the functionality of the Open PHACTS Discovery Platform on the BDE infrastructure. The Open PHACTS Discovery Platform brings together pharmacological data resources in an integrated, interoperable infrastructure, and has been developed to reduce barriers to drug discovery for industry, academia, and small businesses.
Learn more about the progress we’ve made, and what’s coming next.
1. General overview of the Big Data Europe project and Societal Challenges it addresses (Ronald Siebes, VU Amsterdam)
2. The Big Data Europe infrastructure, generic components that are being developed, and their flexibility for different applications (Hajira Jabeen, University of Bonn)
3. Latest details of the current state of the Open PHACTS architecture in BDE, and ongoing work (Nick Lynch, CTO, Open PHACTS Foundation)
Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smar...BigData_Europe
H2020 BigDataEurope is a flagship project of the European Union's Horizon 2020 framework programme for research and innovation. In this talk we present the Docker-based BigDataEurope platform, which integrates a variety of Big Data processing components such as Hive, Cassandra, Apache Flink and Spark. Particularly supporting the variety dimension of Big Data, it adds a semantic data processing layer, which allows to ingest, map, transform and exploit semantically enriched data. In this talk, we will present the innovative technical architecture as well as applications of the BigDataEurope platform for life sciences (OpenPhacts), mobility, food & agriculture as well as industrial analytics (predictive maintenance). We demonstrate how societal value can be generated by Big Data analytics, e.g. making transportation networks more efficient or facilitating drug research.
BDE Webinar: SC6 - EUROPE IN A CHANGING WORLD -INCLUSIVE, INNOVATIVE AND REFL...BigData_Europe
BIG DATA EUROPE WEBINAR: SC6 - EUROPE IN A CHANGING WORLD -INCLUSIVE, INNOVATIVE AND REFLECTIVE SOCIETIES: NEW GENERAL DATA PROTECTION REGULATION ADOPTED. 25.05.2016, 15.00PM CEST –16.00PM CEST by VIGDIS KVALHEIM (CESSDA, DEPUTY DIRECTOR, NSD).
Big Data Europe Introduction Ivana IlijasicVersic (CESSDA) and Martin Kaltenböck (SWC).
Big Data Europe SC6 WS #3: PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL, Mart...BigData_Europe
Presentation at the Big Data Europe SC6 workshop #3 on 11.9.2017 in Amsterdam co-located with SEMANTiCS2017 conference: BDE PIlot Societal Challenge 6: CITIZEN BUDGET ON MUNICIPAL LEVEL by Martin Kaltenboeck (Semantic Web Company, SWC).
Project Description of the Linked Open Data (LOD) PILOT Austria - presented at the PiLOD event at VU Amsterdam (Netherlands) on 29.01. 2014 (see: http://www.pilod.nl/) by Martin Kaltenböck of Semantic Web Company.
IES Cities Hackathon, Zaragoza, 10-12 July 2015
IES Cities Project Overview and API
IES Cities Explanation
What does IES Cities propose?
Main objectives
Added value
IES Cities Apps examples
IES Cities Platform and APIS
Hackathon contest and conditions
FIWARE Wednesday Webinars - Cities as Enablers of the Data Economy: Smart Dat...FIWARE
Cities as Enablers of the Data Economy: Smart Data Models for Cities - 21 October 2020
Corresponding webinar recording: https://youtu.be/b0EWq5E5jAc
Speaker: Alberto Abella (Data Modeling Expert and Technical Evangelist, FIWARE Foundation)
Chapter: Smart Cities
Difficulty: 2
Audience: Technical Domain Specific
Demonstration of the functionality an capabilities of Virtual Hubs developed for ENERGIC-OD project. Presentation given in GEO Business 2017 trade show in London, UK (May 2017).
SC2 Workshop 1: Big Data Europe (BDE) - Project Overview & Food WorkshopBigData_Europe
“Lightning talk” in the Big Data Europe (BDE) workshop on “Big data for food, agriculture and forestry: opportunities and challenges” taking place on 22.9.2015 in Paris by Sören Auer (Fraunhofer IAIS, University of Bonn) - BDE project lead.
SC6 Workshop 1: What can big data do for you? BigData_Europe
Presentation by Sören Auer, Fraunhofer IAIS, Coordinator of Big Data Europe, at the first workshop of Societal Challlenge 6 in the BigDataEurope project, taking place in Luxembourg on 18 November 2015.
http://www.big-data-europe.eu/social-sciences/
European Data Portal - ePSI platform webinar 8 February 2016EuropeanDataPortal
All presentations given during the ePSI platform webinar that was held on 8 February 2016.
The agenda of the webinar:
1) Opening by the European Commission.
2) Introduction to the EDP project
3) Demo of the Portal
4) Technical architecture
5) Focus on CKAN extensions developed
6) Focus on maps application
7) Next steps
8) Discussion and tips & tricks for open data implementation
The global need to securely derive (instant) insights, have motivated data architectures from distributed storage, to data lakes, data warehouses and lake-houses. In this talk we describe Tag.bio, a next generation data mesh platform that embeds vital elements such as domain centricity/ownership, Data as Products, Self-serve architecture, with a federated computational layer. Tag.bio data products combine data sets, smart APIs, statistical and machine learning algorithms into decentralized data products for users to discover insights using FAIR Principles. Researchers can use its point and click (no-code) system to instantly perform analysis and share versioned, reproducible results. The platform combines a dynamic cohort builder with analysis protocols and applications (low-code) to drive complex analysis workflows. Applications within data products are fully customizable via R and Python plugins (pro-code), and the platform supports notebook based developer environments with individual workspaces.
Join us for a talk/demo session on Tag.bio data mesh platform and learn how major pharma industries and university health systems are using this technology to promote value based healthcare, precision healthcare, find cures for disease, and promote collaboration (without explicitly moving data around). The talk also outlines Tag.bio secure data exchange features for real world evidence datasets, privacy centric data products (confidential computing) as well as integration with cloud services
Similar to BDE SC6-hang out - technology part-SWC - Martin (20)
Big Data Europe SC6 WS #3: Big Data Europe Platform: Apps, challenges, goals ...BigData_Europe
Talk at the Big Data Europe SC6 workshop number 3 taking place on 11.9.2017 in Amsterdam co-located with SEMANTiCS2017 conference: The Big Data Europe Platform: Apps, challenges, goals by Aad Versteden, TenForce.
Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...BigData_Europe
Where we are and are going for Big Data in OpenScience
Keynote talk at the Big Data Europe SC6 Workshop on 11.9.2017 in Amsterdam co-located with SEMANTiCS2017: The perspective of European official statistics by Fernando Reis, Task-Force Big Data, European Commission (Eurostat).
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...BigData_Europe
Slides for keynote talk at the Big Data Europe workshop nr 3 on 11.9.2017 in Amsterdam co-located with SEMANTiCS2017 conference by Ron Dekker, Director CESSDA: European Open Science Agenda: where we are and where we are going?
Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...BigData_Europe
Slides of the keynote at the 3rd Big Data Europe SC6 Workshop co-located at SEMANTiCS2018 in Amsterdam (NL) on: The European Research Data Landscape: Opportunities for CESSDA by Peter Doorn, Director DANS, Chair, Science Europe W.G. on Research Data. Chair, CESSDA ERIC General Assembly
BDE SC3.3 Workshop - Options for Wind Farm performance assessment and Power f...BigData_Europe
Options for Wind Farm performance assessment and Power forecasting (Mr. A. Kyritsis, ALTSOL/TERNA) at the BigDataEurope Workshop, Amsterdam, Novermber 2017.
Big Data Europe: Workshop 3 SC6 Social Science: THE IMPORTANCE OF METADATA & ...BigData_Europe
Big Data Europe: Workshop 3 SC6 Social Science - 11.09.2017 in Amsterdam, co-located with SEMANTiCS2017 titled: THE IMPORTANCE OF METADATA & BIG DATA IN OPEN SCIENCE. Slides by Ivana Versic (Cessda) and Martin Kaltenböck (SWC)
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)BigData_Europe
Overview of Open PHACTS, the BDE Pilot project in SC1, presented at BDE SC1 Workshop 3, 13 December, 2017.
https://www.big-data-europe.eu/the-final-big-data-europe-workshop/
BDE SC1 Workshop 3 - Big Data Europe (Simon Scerri)BigData_Europe
Overview of the Big Data Europe project presented at BDE SC1 Workshop 3, 13 December, 2017.
https://www.big-data-europe.eu/the-final-big-data-europe-workshop/
SC1 Hangout: Updating public databases: Automation and other challenges for c...BigData_Europe
A recording of this webinar can be found at https://youtu.be/IqG3j5b-CXQ
Keeping databases up-to-date is a significant challenge with the rate at which many data sources are growing. Open PHACTS and Big Data Europe organised this webinar to hold an open, informal discussion around keeping databases updated – from user needs, to the challenges of automation, to potential technical approaches underpinning key data sources.
Joining our panel are Dr Evan Bolton, who manages the PubChem project at NCBI, and Professor Chris Evelo, Co-Founder and Director at WikiPathways.
Richard's entangled aventures in wonderlandRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
Slide 1: Title Slide
Extrachromosomal Inheritance
Slide 2: Introduction to Extrachromosomal Inheritance
Definition: Extrachromosomal inheritance refers to the transmission of genetic material that is not found within the nucleus.
Key Components: Involves genes located in mitochondria, chloroplasts, and plasmids.
Slide 3: Mitochondrial Inheritance
Mitochondria: Organelles responsible for energy production.
Mitochondrial DNA (mtDNA): Circular DNA molecule found in mitochondria.
Inheritance Pattern: Maternally inherited, meaning it is passed from mothers to all their offspring.
Diseases: Examples include Leber’s hereditary optic neuropathy (LHON) and mitochondrial myopathy.
Slide 4: Chloroplast Inheritance
Chloroplasts: Organelles responsible for photosynthesis in plants.
Chloroplast DNA (cpDNA): Circular DNA molecule found in chloroplasts.
Inheritance Pattern: Often maternally inherited in most plants, but can vary in some species.
Examples: Variegation in plants, where leaf color patterns are determined by chloroplast DNA.
Slide 5: Plasmid Inheritance
Plasmids: Small, circular DNA molecules found in bacteria and some eukaryotes.
Features: Can carry antibiotic resistance genes and can be transferred between cells through processes like conjugation.
Significance: Important in biotechnology for gene cloning and genetic engineering.
Slide 6: Mechanisms of Extrachromosomal Inheritance
Non-Mendelian Patterns: Do not follow Mendel’s laws of inheritance.
Cytoplasmic Segregation: During cell division, organelles like mitochondria and chloroplasts are randomly distributed to daughter cells.
Heteroplasmy: Presence of more than one type of organellar genome within a cell, leading to variation in expression.
Slide 7: Examples of Extrachromosomal Inheritance
Four O’clock Plant (Mirabilis jalapa): Shows variegated leaves due to different cpDNA in leaf cells.
Petite Mutants in Yeast: Result from mutations in mitochondrial DNA affecting respiration.
Slide 8: Importance of Extrachromosomal Inheritance
Evolution: Provides insight into the evolution of eukaryotic cells.
Medicine: Understanding mitochondrial inheritance helps in diagnosing and treating mitochondrial diseases.
Agriculture: Chloroplast inheritance can be used in plant breeding and genetic modification.
Slide 9: Recent Research and Advances
Gene Editing: Techniques like CRISPR-Cas9 are being used to edit mitochondrial and chloroplast DNA.
Therapies: Development of mitochondrial replacement therapy (MRT) for preventing mitochondrial diseases.
Slide 10: Conclusion
Summary: Extrachromosomal inheritance involves the transmission of genetic material outside the nucleus and plays a crucial role in genetics, medicine, and biotechnology.
Future Directions: Continued research and technological advancements hold promise for new treatments and applications.
Slide 11: Questions and Discussion
Invite Audience: Open the floor for any questions or further discussion on the topic.
The increased availability of biomedical data, particularly in the public domain, offers the opportunity to better understand human health and to develop effective therapeutics for a wide range of unmet medical needs. However, data scientists remain stymied by the fact that data remain hard to find and to productively reuse because data and their metadata i) are wholly inaccessible, ii) are in non-standard or incompatible representations, iii) do not conform to community standards, and iv) have unclear or highly restricted terms and conditions that preclude legitimate reuse. These limitations require a rethink on data can be made machine and AI-ready - the key motivation behind the FAIR Guiding Principles. Concurrently, while recent efforts have explored the use of deep learning to fuse disparate data into predictive models for a wide range of biomedical applications, these models often fail even when the correct answer is already known, and fail to explain individual predictions in terms that data scientists can appreciate. These limitations suggest that new methods to produce practical artificial intelligence are still needed.
In this talk, I will discuss our work in (1) building an integrative knowledge infrastructure to prepare FAIR and "AI-ready" data and services along with (2) neurosymbolic AI methods to improve the quality of predictions and to generate plausible explanations. Attention is given to standards, platforms, and methods to wrangle knowledge into simple, but effective semantic and latent representations, and to make these available into standards-compliant and discoverable interfaces that can be used in model building, validation, and explanation. Our work, and those of others in the field, creates a baseline for building trustworthy and easy to deploy AI models in biomedicine.
Bio
Dr. Michel Dumontier is the Distinguished Professor of Data Science at Maastricht University, founder and executive director of the Institute of Data Science, and co-founder of the FAIR (Findable, Accessible, Interoperable and Reusable) data principles. His research explores socio-technological approaches for responsible discovery science, which includes collaborative multi-modal knowledge graphs, privacy-preserving distributed data mining, and AI methods for drug discovery and personalized medicine. His work is supported through the Dutch National Research Agenda, the Netherlands Organisation for Scientific Research, Horizon Europe, the European Open Science Cloud, the US National Institutes of Health, and a Marie-Curie Innovative Training Network. He is the editor-in-chief for the journal Data Science and is internationally recognized for his contributions in bioinformatics, biomedical informatics, and semantic technologies including ontologies and linked data.
Seminar of U.V. Spectroscopy by SAMIR PANDASAMIR PANDA
Spectroscopy is a branch of science dealing the study of interaction of electromagnetic radiation with matter.
Ultraviolet-visible spectroscopy refers to absorption spectroscopy or reflect spectroscopy in the UV-VIS spectral region.
Ultraviolet-visible spectroscopy is an analytical method that can measure the amount of light received by the analyte.
A brief information about the SCOP protein database used in bioinformatics.
The Structural Classification of Proteins (SCOP) database is a comprehensive and authoritative resource for the structural and evolutionary relationships of proteins. It provides a detailed and curated classification of protein structures, grouping them into families, superfamilies, and folds based on their structural and sequence similarities.
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Sérgio Sacani
Since volcanic activity was first discovered on Io from Voyager images in 1979, changes
on Io’s surface have been monitored from both spacecraft and ground-based telescopes.
Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large
Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images
show that a plume deposit from a powerful eruption at Pillan Patera has covered part
of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive
optics at visible wavelengths.
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
BDE SC6-hang out - technology part-SWC - Martin
1. BIG DATA EUROPE
PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL
EUROPE IN A CHANGING WORLD - INCLUSIVE, INNOVATIVE AND REFLECTIVE SOCIETIES
HANG OUT
28 SEPTEMBER 2016
MARTIN KALTENBOECK (CFO, SEMANTIC WEB COMPANY)
Integrating Big Data, Software & Communities for Addressing
Europe’s Societal ChallengesBDE SC6 Hangout
2. Big Data Europe (CSA: 2015-17)
Show societal value of Big Data: 7 Domains
Lower barrier for using big data technologies
o Required effort and resources
o Limited data science skills
Help establishing cross-
lingual/organizational/domain Data Value
Chains 26-oct.-16
3. Big Data Europe
26-oct.-16
COORDINATION
Stakeholder Engagement
(Requirements Elicitation)
SUPPORT
Design, Realise, Evaluate
Big Data Aggregator
Platform
Create and Manage Societal
Big Data Interest Groups
Cloud-deployment ready
Big Data Aggregator
Platform
CSA
Measures
Results
4. THE BDE PLATFORM
ARCHITECTURE & COMPONENTS
Integrating Big Data, Software & Communities for Addressing
Europe’s Societal Challenges
7. Adding a Semantic Layer to Data
Lakes Manufacturing Marketing Sales SupportAccounting
Semantic Data Lake
• central place for
model, schema and
data historization
• Combination of Scale
Out (cost reduction)
and semantics
(increased control &
flexibility)
• grows incrementally
(pay-as-you-go)
Inbound
Data Sources
Outbound and
Consumption
Inbound Raw Data Store
Data Lake (order of magnitude cheaper scalable data store)
Knowledge Graph for Relationship Definition and Meta Data
Frontend to Access Relationship and KPI Definition /
Documentation
Frontend to Access (ad hoc) Reports
Outbound Data Delivery to
Target Systems
JSON-LD CSVW R2RMLXML2RDF
8. Why to use BDE Technology?
Hortonworks Cloudera MapR Bigtop BDE
File System HDFS HDFS NFS HDFS HDFS
Installation Native Native Native Native lightweight virtualization
Plug & play components
(no rigid schema)
no no no no yes
High Availability Single failure
recovery (yarn)
Single failure
recovery (yarn)
Self healing, mult.
failure rec.
Single failure recovery
(yarn)
Multiple Failure recovery
Cost Commercial Commercial Commercial Free Free
Scaling Freemium Freemium Freemium Free Free
Addition of custom
components
Not easy No No No Yes
Integration testing yes yes yes yes --
Operating systems Linux Linux Linux Linux All
Management tool Ambari Cloudera manager MapR Control system - Docker swarm UI+ Custom
9. SC6 PILOT
CITIZENS BUDGET ON MUNICIPAL LEVEL
ARCHITECTURE & COMPONENTS
Integrating Big Data, Software & Communities for Addressing
Europe’s Societal Challenges
11. SC6: Social Sciences
26-oct.-16www.big-data-europe.eu
Pilot focus area:
Citizens budget spending
on municipal level
Big Data Focus area: Statistical
and research data linking &
integration
Selected Key Data assets: Detailed
budget execution data in city level,
statistical data from public data
portals and statistical offices,
federated social sciences data
12. 4 Vs of Big Data in SC6 Pilot
Variety: requirement based on the harvesting of budget data and
budget execution data from several sources, available in different
structures and formats.
Volume: requirement regarding the growing amount of open budget
data available as well as of budget execution data
Velocity: requirements regarding budget execution data that is
provided on continuous basis by the publisher (daily, weekly, monthly).
Veracity: Veracity refers to the biases, noise and abnormality in data.
Even for within the same country there are differences on the published
data because often are coming from different systems or public
accounting standards are not enforced absolutely uniformly (e.g.
different municipal departments) 26-oct.-16www.big-data-europe.eu
13. SC6 Pilot - Architecture
26-oct.-16www.big-data-europe.eu
14. SC6 Pilot: Technical
Components
Apache Flume, https://flume.apache.org/ (data ingestion)
Apache Kafka, http://kafka.apache.org (messaging service)
Apache Spark, http://spark.apache.org (distributed analysis, transformation)
Apache HDFS, http://hadoop.apache.org (raw data storage)
SWCs’ PoolParty Semantic Suite, http://poolparty.biz (data consolidation, curation,
mapping)
OpenLink s’ Virtuoso, http://virtuoso.openlinksw.com (triple store – data storage)
Apache HTTP, http://httpd.apache.org (linked data serving)
Apache Avro, http://avro.apache.org/docs/current/ (intermediate data schema)
D3 JS Library, https://d3js.org/ (visualisation of RDF data using SPARQL queries)
SWCs’ PoolParty GraphSearch (SPARQL based interface component for filter & faceted
search)
26-oct.-16www.big-data-europe.eu
16. SC6 Pilot: Pilot Evaluation
Evaluation Approach SC6 Pilot:
Invite municipalities to evaluate and use the system
Invite community (open data, data community, BDE community, W3C)
Evaluate within the 2 participating projects (BDE, YourDataStories)
BDE SC6 workshop in Cologne, 5.12.2016 + Overall BDE Tech WS
(ApacheCon)
Additional evaluation – tests over time with
a growing amount of data
a growing number of different sources & formats docked onto the system
additional analytics in place
26-oct.-16www.big-data-europe.eu
17. How to benefit best from BDE
26-oct.-16www.big-data-europe.eu
Health
19 October
Brussel
s
Standalone Workshop
Food&Agri 30 September
2016
Brussel
s
Collocated with DG AGRI WP2018-20 stakeholder
consultation
Energy 20 September
2016
Brussel
s
Collocated with H2020 Energy InfoDay (19
th
)
Transport 16 September
2016
Brussel
s
Collocated with TM 2.0 Steering Body meeting
Climate February 2017 Brussel
s
Collocated with EC JRC ISPRA Workshop
Societies 5 December 2016 Cologn
e
Collocated with EDDI16- 8th Annual European DDI
User Conference: http://bde-sc6-2016.eventbrite.com
(40 seats)
Security 18 October 2016 Brussel
s
Standalone Workshop
• BDE Workshops& Webinars
• Use & expand the BDE Platform
• Visit Website: news, events,
community, …
• Big Data Europe W3C Community
18. Contacts:
CESSDA, http://cessda.net/
Ivana Ilijasic Versic, ivana.versic@cessda.net
Hossein Abroshan, hossein.abroshan@cessda.net
NCSR-D, http://www.demokritos.gr/?lang=en
Michalis Vafopoulos, vafopoulos@gmail.com
Semantic Web Company (SWC), http://www.semantic-web.at
Martin Kaltenböck, m.kaltenboeck@semantic-web.at
Jürgen Jakobitsch, j.jakobitsch@semantic-web.at
26-oct.-16www.big-data-europe.eu
Project obecjtives:
Addressing each of the Societal Challenge domains (7), we have a domain representative for each & a pilot instantiation of the BDE platform for each in progress
One of the challenges to Big Data opportunities is the lack of skills (data science) – our aim is to provide out of the box technology with not a lot of training required to use and apply
BDE technology can be applied in multiple domains and in different phases within Data Value Chains, working with different data providers and addressing multiple objectives (as opposed to current solutions, which tend to be very specific to one data source or domain, and address one objective.
Project obecjtives:
Addressing each of the Societal Challenge domains (7), we have a domain representative for each & a pilot instantiation of the BDE platform for each in progress
One of the challenges to Big Data opportunities is the lack of skills (data science) – our aim is to provide out of the box technology with not a lot of training required to use and apply
BDE technology can be applied in multiple domains and in different phases within Data Value Chains, working with different data providers and addressing multiple objectives (as opposed to current solutions, which tend to be very specific to one data source or domain, and address one objective.
Data Lake is a storage repository for big data scale raw data in original data formats.
late binding approach to schema: “Let us decide, when we need it.”
scale out architecture on commodity infrastructure, mostly with HFS/Hadoop/Spark, which gives a huge cost advantage – about factor 10 compared to data warehouses.
Semantic Data Lake = Data Lake + Knowledge Graph
management of structure (vocabularies/schemas, KPIs trees, metadata, …) on top of the Data Lake is performed in a knowledge graph - a complex data fabric representing all kinds of things and how they relate to each other.
A knowledge graph is unique regarding flexibility, multiple views and metadata capabilities.
Based on the Resource Description Framework (RDF) standard and Linked Data principles.
Data Lake is a storage repository for big data scale raw data in original data formats.
late binding approach to schema: “Let us decide, when we need it.”
scale out architecture on commodity infrastructure, mostly with HFS/Hadoop/Spark, which gives a huge cost advantage – about factor 10 compared to data warehouses.
Semantic Data Lake = Data Lake + Knowledge Graph
management of structure (vocabularies/schemas, KPIs trees, metadata, …) on top of the Data Lake is performed in a knowledge graph - a complex data fabric representing all kinds of things and how they relate to each other.
A knowledge graph is unique regarding flexibility, multiple views and metadata capabilities.
Based on the Resource Description Framework (RDF) standard and Linked Data principles.