#Big Data in #Austria
Big Data – Challenges and Potentials
Mario Meir-Huber and Martin Köhler
European Data Economy Workshop, Semantics 2015
15.09.2015, Vienna, Austria
Study „#BigData in #Austria“
 Study „#BigData in #Austria“
 Project duration: 1.11.2013 – 30.04.2014
 Project partners:
• IDC Central Europe GmbH
• AIT Austrian Institute of Technology, Mobility Department
 Contact persons:
• Mario Meir-Huber, IDC (Teradata)
• Martin Köhler, AIT
 Content:
• State-of-the-Art in Big Data
• Market analysis
• Best practice for Big Data projects
 Download (in german):
• FFG „Studies of ICT of the future“: https://www.ffg.at/studien-aus-ikt-der-zukunft
#Big Data in #Austria has been funded in the funding frame „ICT of the future “ of the
Austrian Research Promotion Agency (FFG) and the Austrian Ministry for Transport,
Innovation and Technology (BMVIT).
2
Data-intensive science
© IDC Visit us at IDC.com and follow us on Twitter: @IDC
Visit the project: http://bigdataaustria.wordpress.com
3
 Enormous data archives are at
hand
 Various data sources
 Often available in real-time
 Investigating huge data volumes and
driving research and industry
 Science is moving increasingly from
hypothesis-driven to data-driven
discoveries
 Correlation vs. Causality
Big Data Definition
430.09.2015
“Big Data” is a term encompassing the use of techniques to capture, process,
analyse and visualize potentially large datasets in a reasonable timeframe
not accessible to standard IT technologies. By extension, the platform, tools
and software used for this purpose are collectively called “Big Data
technologies”.
NESSI White Paper, December 2012
4
Four characteristics:
•Volume: In the last years the amount of generated data increased enormously
•Velocity: Analysing more data in shorter time frames
•Variety: Huge diversity of data formats (Arbitrary–> Relational > Freitext)
•Value: Extracting value (knowledge)
Hardware and software technologies for manageing and
Analyzing huge amounts of data
Or simply said
IF DATA IS PART OF THE PROBLEM
Big Data Dimensions
Legal
dimension
Social
dimension
Economic
dimension
Technological
dimension
Application
dimension
Copyright
Privacy
User behaviour
collaboration
Social implikations
Business models
Benchmarking
Pricing
Scalable data processing
Signal processing
Statistics
Linguistics
HCI/Visualization
Electronic archiving
Decision support
Industry solutions
30/09/2015
5
Big Data Technology Stack
Hadoop
Ecosystem
Big Data
Platforms
Data
Ingestion
And
Processing
Efficiency
Trust
Workload
Governance
Tools
Platform
Programming
Parallel
Big Data
Analytics
Data
Science
Transform
question to
algorithm
Machine
Learning
Analysis
Integration
Query
Performance
Transform
Warehousing
Big Data
Utilization
Domain
Expertise
Asking the
right
question
Reporting &
Dashboards
Alerting &
Recommendat
ions
Business
Intelligence
Text Analysis
and Search
30/09/2015 6
Data
Centers
Big Data
Management
Scalable Data
Storage
IaaS
Cloud
Virtualization
Network
Compute
Storage
DBMS
NoSQL
ManagementSecurity
PrivacyGovernance
Data
Value
Big Data Management
7
 Technologies for the efficient management of huge
data amounts
• Storage and management of data
• Provisioning and management of the infrastructure
Cloud Ressources (Internal) Data
Centers
Storage
Big Data Platforms
8
 Technologies for (massively) parallel execution of data analytics on
huge amounts of data
• Provisioning of parallelized and scalable execution systems
• Real-time integration of sensor data
Massively parallel
programming
Programming models
for data-intensive
applications
(e.g. MapReduce)
High-Level Query
languages
Scripting languages
and abstraction of low-
level data-intensive
query languages
Streaming
Real-time processing of
(sensor-) data (which can
not be stored)
Ad-Hoc queries
Real-time access on
huge data amounts
(Query optimization –
SQL vs. MapReduce)
Google Pregel
Apache Drill
Big Data Analytics
9
 Technologies for extracting information/knowledge from huge data
amounts
• Pattern recognition
• Pattern matching
• .
Big Data Utilization
10
 Technologies for extracting value
• Strengthening the market situation of an organization
• Technologies for (simplified) utilization of data
Business
Intelligence
Provisioning of efficient
indicators based on
data (Reporting, KPIs,
Audit, …)
Knowledge
Management
Management and
representation of
knowledge
(Ontologies,
LinkedData,
Knowledge
management systems)
Decision Support
Supporting decision
making; incorporates
data management,
modelling, innovative
and interactive user
interfaces
Visualization
Interactive Visualization
of complex informations
and networks on different
levels of abstractions
(Visual Analytics)
Traditional versus Data-intensive Approach
– 11 –
HADOOP
Iterate over structure
Transform and analyze
Hadoop Approach
• Apply schema on read
• Support range of access patterns to
data stored in HDFS: polymorphic
access
Batch Interactive Real-time
Right Engine, Right Job
In-memory
Traditional Approach
• Apply schema on write
• Heavily dependent on IT
Determine list of questions
Design solution
Collect structured data
Ask questions from list
Detect additional questions
Single Query Engine
SQL
Technical and scientific challenges
 Visual Analytics
• Combine the strengths of human and
electronic data processing
 Big Data Analytics
• Techniques making use of complete
data set, instead of sampling
 Real time analytics, (cross)-
stream processing
• Expect real-time or near real-time
responses from the systems
 Content Validation
• Validating the vast amount of
information in content networks, Trust
1230/09/2015
Distributed Storage (IaaS, NoSQL)
Datacenter
Parallel Stream Processing
MapReduce Extensions
Use Cases and Enterprise Services
Scientific Data Life Sciences Business Reporting
DatacenterDatacenter
Market analysis
 State-of-the-art in methods and tools
• ~50 Big data toolkits
 Analysis of Austrian market
participants
• ~60 Austrian and internationals
companies
• Industry analysis
 Tertiary education
• Overview of Big data topics in course of
studies
• Research overview
 Open data portals and data sets
© IDC Visit us at IDC.com and follow us on Twitter: @IDC
Visit the project: http://bigdataaustria.wordpress.com
13
Global market
 IDC expects a growth of the
global market from 9,8 Billion
USD in 2012 to 32,4 Billion
USD in 2017
 Yearly growth rate: 27%
 Austrian market 2013:
• ~ 23 Mio Euro
Code of practice for big data projects
Support and orientation for the impementation of big data projects
 Reference projects
• Medicine
• Mobility
• Earth observation
• Crisis and disaster management
• Trade
15
Process model Maturity model
Reference architecture
Code of practice for big data projects
16
„We will soon have a huge skills shortage for data-
related jobs.“
Neelie Kroes (ICT 2013, Nov.7, Vilnius)
„Data Scientist: The Sexiest Job of the 21st Century“
http://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/ar/1
Code of practice for big data projects
17
Recommendations and implications
„Data is a commodity – competence is the key“
18
AddedValue
MarketLeadership
Locationattractiveness
Enhancecompetences
Visibility
Objectives
Competence
Enable data
access
Legislation
Provide
infrastructure
Current status
Focus, create and provide competences
Secure competences for the long-term
Establish holistic institution
Establish (international) legal certainty
Establish general framework for data markets
Incentives for Open Data
Enhance funding for SMEs
Steps
20
?
Mario Meir-Huber
mario@meirhuber.de
Martin Köhler, AIT
Koehler.martin@gmail.com

Study: #Big Data in #Austria

  • 1.
    #Big Data in#Austria Big Data – Challenges and Potentials Mario Meir-Huber and Martin Köhler European Data Economy Workshop, Semantics 2015 15.09.2015, Vienna, Austria
  • 2.
    Study „#BigData in#Austria“  Study „#BigData in #Austria“  Project duration: 1.11.2013 – 30.04.2014  Project partners: • IDC Central Europe GmbH • AIT Austrian Institute of Technology, Mobility Department  Contact persons: • Mario Meir-Huber, IDC (Teradata) • Martin Köhler, AIT  Content: • State-of-the-Art in Big Data • Market analysis • Best practice for Big Data projects  Download (in german): • FFG „Studies of ICT of the future“: https://www.ffg.at/studien-aus-ikt-der-zukunft #Big Data in #Austria has been funded in the funding frame „ICT of the future “ of the Austrian Research Promotion Agency (FFG) and the Austrian Ministry for Transport, Innovation and Technology (BMVIT). 2
  • 3.
    Data-intensive science © IDCVisit us at IDC.com and follow us on Twitter: @IDC Visit the project: http://bigdataaustria.wordpress.com 3  Enormous data archives are at hand  Various data sources  Often available in real-time  Investigating huge data volumes and driving research and industry  Science is moving increasingly from hypothesis-driven to data-driven discoveries  Correlation vs. Causality
  • 4.
    Big Data Definition 430.09.2015 “BigData” is a term encompassing the use of techniques to capture, process, analyse and visualize potentially large datasets in a reasonable timeframe not accessible to standard IT technologies. By extension, the platform, tools and software used for this purpose are collectively called “Big Data technologies”. NESSI White Paper, December 2012 4 Four characteristics: •Volume: In the last years the amount of generated data increased enormously •Velocity: Analysing more data in shorter time frames •Variety: Huge diversity of data formats (Arbitrary–> Relational > Freitext) •Value: Extracting value (knowledge) Hardware and software technologies for manageing and Analyzing huge amounts of data Or simply said IF DATA IS PART OF THE PROBLEM
  • 5.
    Big Data Dimensions Legal dimension Social dimension Economic dimension Technological dimension Application dimension Copyright Privacy Userbehaviour collaboration Social implikations Business models Benchmarking Pricing Scalable data processing Signal processing Statistics Linguistics HCI/Visualization Electronic archiving Decision support Industry solutions 30/09/2015 5
  • 6.
    Big Data TechnologyStack Hadoop Ecosystem Big Data Platforms Data Ingestion And Processing Efficiency Trust Workload Governance Tools Platform Programming Parallel Big Data Analytics Data Science Transform question to algorithm Machine Learning Analysis Integration Query Performance Transform Warehousing Big Data Utilization Domain Expertise Asking the right question Reporting & Dashboards Alerting & Recommendat ions Business Intelligence Text Analysis and Search 30/09/2015 6 Data Centers Big Data Management Scalable Data Storage IaaS Cloud Virtualization Network Compute Storage DBMS NoSQL ManagementSecurity PrivacyGovernance Data Value
  • 7.
    Big Data Management 7 Technologies for the efficient management of huge data amounts • Storage and management of data • Provisioning and management of the infrastructure Cloud Ressources (Internal) Data Centers Storage
  • 8.
    Big Data Platforms 8 Technologies for (massively) parallel execution of data analytics on huge amounts of data • Provisioning of parallelized and scalable execution systems • Real-time integration of sensor data Massively parallel programming Programming models for data-intensive applications (e.g. MapReduce) High-Level Query languages Scripting languages and abstraction of low- level data-intensive query languages Streaming Real-time processing of (sensor-) data (which can not be stored) Ad-Hoc queries Real-time access on huge data amounts (Query optimization – SQL vs. MapReduce) Google Pregel Apache Drill
  • 9.
    Big Data Analytics 9 Technologies for extracting information/knowledge from huge data amounts • Pattern recognition • Pattern matching • .
  • 10.
    Big Data Utilization 10 Technologies for extracting value • Strengthening the market situation of an organization • Technologies for (simplified) utilization of data Business Intelligence Provisioning of efficient indicators based on data (Reporting, KPIs, Audit, …) Knowledge Management Management and representation of knowledge (Ontologies, LinkedData, Knowledge management systems) Decision Support Supporting decision making; incorporates data management, modelling, innovative and interactive user interfaces Visualization Interactive Visualization of complex informations and networks on different levels of abstractions (Visual Analytics)
  • 11.
    Traditional versus Data-intensiveApproach – 11 – HADOOP Iterate over structure Transform and analyze Hadoop Approach • Apply schema on read • Support range of access patterns to data stored in HDFS: polymorphic access Batch Interactive Real-time Right Engine, Right Job In-memory Traditional Approach • Apply schema on write • Heavily dependent on IT Determine list of questions Design solution Collect structured data Ask questions from list Detect additional questions Single Query Engine SQL
  • 12.
    Technical and scientificchallenges  Visual Analytics • Combine the strengths of human and electronic data processing  Big Data Analytics • Techniques making use of complete data set, instead of sampling  Real time analytics, (cross)- stream processing • Expect real-time or near real-time responses from the systems  Content Validation • Validating the vast amount of information in content networks, Trust 1230/09/2015 Distributed Storage (IaaS, NoSQL) Datacenter Parallel Stream Processing MapReduce Extensions Use Cases and Enterprise Services Scientific Data Life Sciences Business Reporting DatacenterDatacenter
  • 13.
    Market analysis  State-of-the-artin methods and tools • ~50 Big data toolkits  Analysis of Austrian market participants • ~60 Austrian and internationals companies • Industry analysis  Tertiary education • Overview of Big data topics in course of studies • Research overview  Open data portals and data sets © IDC Visit us at IDC.com and follow us on Twitter: @IDC Visit the project: http://bigdataaustria.wordpress.com 13
  • 14.
    Global market  IDCexpects a growth of the global market from 9,8 Billion USD in 2012 to 32,4 Billion USD in 2017  Yearly growth rate: 27%  Austrian market 2013: • ~ 23 Mio Euro
  • 15.
    Code of practicefor big data projects Support and orientation for the impementation of big data projects  Reference projects • Medicine • Mobility • Earth observation • Crisis and disaster management • Trade 15 Process model Maturity model Reference architecture
  • 16.
    Code of practicefor big data projects 16 „We will soon have a huge skills shortage for data- related jobs.“ Neelie Kroes (ICT 2013, Nov.7, Vilnius) „Data Scientist: The Sexiest Job of the 21st Century“ http://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/ar/1
  • 17.
    Code of practicefor big data projects 17
  • 18.
    Recommendations and implications „Datais a commodity – competence is the key“ 18
  • 19.
    AddedValue MarketLeadership Locationattractiveness Enhancecompetences Visibility Objectives Competence Enable data access Legislation Provide infrastructure Current status Focus,create and provide competences Secure competences for the long-term Establish holistic institution Establish (international) legal certainty Establish general framework for data markets Incentives for Open Data Enhance funding for SMEs Steps
  • 20.

Editor's Notes