This lecture highlights current trends, challenges and opportunities related to the emergence of large amounts of data. It also presents Sirris’s recent research activities in this domain.
2. Innovate with Data
• Introduction
• Dr. Elena Tsiporkova, Senior Technology Advisor, Sirris
• Data Driving Innovation
• Gabriel Reid, Senior Software Engineer, TomTom
• Traffic data and mobility content
• Dr. Steven Logghe, Chief Traffic, BE-Mobile
• Lily, Smart Data at Scale made Easy
• Steven Noels, CEO, Outerthought
3. Data is fueling the knowledge- based
economy and society
Elena Tsiporkova & Tom Tourwé
ICT & Software Engineering
het collectief centrum van de Belgische technologische industrie
4. Data is omnipresent …
• Web of Data
• Social Media: LinkedIn, Xing, Twitter, Facebook, …
• Media and Publishing: news & events sites, blogs, discussion forums, …
• Data (Research) Repositories: Wikipedia, GoogleScholar, DBLP, PubMed,
CiteSeer, …
• Enterprise Data Web: enterprise-related, business, financial and
regulatory data published by commercial organisations
• Government Data for Citizens: institutional, infrastructure, economical,
political, legal and social information provided by governments
5. … and more data
• Internet of things / Web of Events
• Dynamic Traffic Monitoring: traffic management, smart taxi’s
• Video Surveillance Systems: security, ambient assisted living, …
• Smart Factories: monitoring and control of manufacturing processes, e-
maintenance, energy-aware & agile manufacturing
• Smart buildings: energy & water management, smart sensors & smart grid
• Crisis Management: emergency dispatching, flooding monitors, …
6. … and more data
• Health
• Medical imaging
• Real-time video feeds created during surgery
• Permanent (mobile) telemonitoring
• Patient records
• Life sciences
• DNA sequencing
• High-throughput screening
• R&D
• LHC@CERN generates 15 petabytes/year
7. … and even more data is becoming
available
• Every device is or will be connected
• Mobile phones, computers
• Cars, bikes
• Fridges, energy meters
• Creation of data as a by- product
• Usage data, logs
• Growing user- generated content
• Social media, blogs, discussion forums, ...
• Wide adoption of the open data initiative by governments
• Linked Open Data
• Technology evolution
• (Elastic) cloud computing
• Parallelism on commodity hardware
• Cheap hardware and network connectivity cost
8. Data creates opportunities …
• Improved Decision Making
• Uncover hidden insights and infer additional knowledge from data
• Enable advanced visualization of trends and patterns
• Reduce information overload and target proactive information delivery
• safety-critical environments / financial domains:
decisions need to be made in a matter of
seconds, and nobody has a global overview of
the exact situation
• e-maintance: based on historical data predict
when a machine will need to undergo
maintenance due to failure of the hardware
9. … and more opportunities
• Innovations in business models, products and services
• Free products & services in return of data ownership
• e.g. free restaurant inquiry phone service
• Transition from selling a product to selling a service
• e.g. renting of machines equipped with sensors
• Emergence of data market places
10. … and even more opportunities …
… as customers, consumers and citizens are becoming both direct
and indirect beneficiaries of data
• Large scale genetic data
personalized therapies tailored to the patient’s genetic profile
• Open government data policy
higher-level of engagement of citizens with the government
• Real-time traffic information
reduced travelling time and fuel consumption
• Commercially valuable user information
better match between products and customer needs
11. Data poses challenges …
• Technological
• Scalable data storage and processing
• Data format standardization (RDF, linked data)
• Data integration from heterogeneous sources e.g. public data with
purchased data with proprietary data
• Adaptive software supporting data acquisition
• Real Time Information Processing
• Event Recognition for Intelligent Resource Management
• Manage a large population of devices
• Decentralized intelligence
• Discovery and mapping of real, digital and virtual entities
• Consider information from human behaviours and multi-modal
interactions
• Act on behalf of the users’ intentions
• …
12. … and more challenges …
• Organisational
• A shortage of talent, which is difficult to create, taking years of training:
• people with deep expertise in statistics and machine learning
• multi- and cross-discipline expertise
• Managers and analysts who know how to operate companies by using
insights from data
• Gathering all the internally available data in a central place
• Adequate infrastructure needs to be put in place
• Policy- related
• Privacy
• Security
• Intellectual Property Rights
• Liability
• Ethics
15. Linked Data - Connect Distributed Data
across the Web
• Web of Data
• loosely structured & disconnected data
• difficult to integrate & query
• Linked data is a way of publishing data on the Web that:
• encourages reuse
• reduces redundancy
• maximizes its (real and potential) inter-connectedness
• enables network effects to add value to data
• Community project with W3C support
• Began early 2007
16. How does it work in practice?
• Consider existing open data sets
• Wikipedia, Geonames, WordNet, DBLP bibliography, …
• Make them available on the Web in RDF format
• use the concept of triples to describe relationship between data
• subject-predicate-object
• Interlink them by setting RDF links between data items from different
data sources
about
subject of
about
written by
written by author of
18. DBpedia
• A community effort to extract structured information from Wikipedia
and to make this information available on the Web.
• Allows users to ask sophisticated queries against Wikipedia, and to link
other data sets on the Web to Wikipedia data.
Initiated by people at the Free University of Berlin and the
University of Leipzig, in collaboration with OpenLink Software
20. No two humans are genetically identical
• Human genetic variation refers to genetic differences both within
and among populations
• The 1000 Genomes Project is the first project to provide a
comprehensive resource on human genetic variation
• The study of human genetic variation has
• Evolutionary significance
• helps understand ancient human population migrations as well as how different
human groups are biologically related to one another
• Large impact in medical genomics research
• helps identify genetic causes of diseases which occur at a greater frequency in
people from specific geographic regions
21. 1000 Genomes Project
• Largest data collection project as yet undertaken in biology
• A community resource project
• launched in January 2008 with the participation of 75 universities and
companies from around the world
• with the aim to
• sequence the genomes (DNA sequencing) of at least one thousand anonymous
participants from a number of different ethnic groups
• release data rapidly for the benefit of the scientific community
22. DNA sequencing
• The main role of DNA is the long-term storage of genetic information
used in the development and functioning of all living organisms
• The information in DNA is stored as a code made up of four chemical
bases: Adenine (A), Guanine (G), Cytosine (C) and Thymine (T)
• Determing the order of these bases is called DNA sequencing
• The human genome consists of approximately 3 billion DNA base
pairs
23. The results from the pilot phase …
• … have revealed some 15 million gene variants, more than half of
which had never been observed
• … and generated datasets of over 50 terabytes, corresponding to
almost eight trillion DNA base pairs of sequence data
• The results also highlight the fact that there is
still an enormous amount left to learn
• In its next phase, the project will expand its
sequencing efforts further to 2500 individuals
26. Pro- active decision support in data- intensive
environments
Pro-actively push relevant Present information in a user-specifc
information and context-aware way
Optimise the choices available
Keep the user in control
Provide the right dosage of Implement intention-aware adaptive
information at the right time automation (trading of control)
27. ASTUTE aims …
• … to develop a platform for building embedded products that
capture and act upon user intentions …
• … taking into account the user’s context (i.e. user environment and
all factors influencing the user performance) and state (i.e. aspects
determining the ability of the user to perform in a given situation) …
• … in order to turn the overwhelming amount of information into
targeted, context-sensitive advice …
• … so as to focus and enhance the user's attention and decision
making
28. ASTUTE application domains
• Smart Embedded Emergency Dispatching System
A decentralized solution for emergency management
• Embedded Driver Infotainment System
An intelligent visual system, to be run on personal positioning and
navigation devices
• Intelligent Cockpit
An empathic system to support anticipation during flight operations
• Virtual Control Room
Smart control room systems for data & event intensive applications for
buildings and manufacturing management
29. Conclusions
… modern economic activity, innovation, and
growth simply couldn’t take place without data …
30. What follows …
• Data Driving Innovation
• Gabriel Reid, Senior Software Engineer, TomTom
• Traffic data and mobility content
• Dr. Steven Logghe, Chief Traffic, BE-Mobile
• Lily, Smart Data at Scale made Easy
• Steven Noels, CEO, Outerthought