2. http://www.bigdata.uni-frankfurt.de/
Mission
The objective of the Big Data Laboratory is to carry out research in the
domains of big data and data analytics from the perspective of information
systems and computer science.
Our approach is based on the interdisciplinary binding between data management technologies
and analytics.
2
The lab is located in Frankfurt, the financial
metropolis of central Europe and targets to be a source of
knowledge and expertise both for research and industry
applications.
Frankfurt Big Data Lab
The DATA REFUGEES Project
3. http://www.bigdata.uni-frankfurt.de/
Prof. Dott. Ing.
Roberto V. Zicari
Dr. Karsten Tolle
Lab Director
Hee Eun Kim
PhD Student
Todor Ivanov
PhD Student
Marten Rosselli
PhD Student
Affiliations: Goethe
University / Accenture
Sven Rill
PhD Student
Affiliation: Hof
University of Applied
Sciences
Rahul Soni
PhD Student
Affiliations: Goethe
University / Accenture
Concha Sanchez-Ocaña
Project Manager
DBIS, Goethe University
Frankfurt
Raik Niemann
PhD Student
Affiliation: Hof
University of Applied
Sciences
3
Team
5. http://www.bigdata.uni-frankfurt.de/
Our lab is currently active in the following research areas:
1. Big Data Management Technologies
2. Data Analytics / Data Science
3. Graph Databases / Linked Open Data (LOD)
4. Big Data for Common Good
5
Research Areas
6. http://www.bigdata.uni-frankfurt.de/
Our work is concentrated on the evaluation and
optimization of
Operational data stores that allow
flexible schemas
Big Data management and analytical
platforms (Hadoop, Spark, etc …)
Complex distributed storage and
processing architectures
Big Data Benchmarks
6
1. Big Data Management Technologies
7. http://www.bigdata.uni-frankfurt.de/
7
1. Big Data Management Technologies (cnt)
Benchmarking Big Data platforms for performance, scalability, elasticity, fault-tolerance …
Yahoo Cloud Service
Benchmark (YCSB)
Evaluating the performance (read/write
workloads) of NoSQL stores like Cassandra.
HiBench
10 workloads for evaluating the Hadoop platform in terms of
speed, throughput, HDFS bandwidth, system resource
utilization and machine learning algorithms.
BigBench
Application level benchmark consisting of 30 queries
implemented in Hive based on the TPC-DS
benchmark.
TPCx-HS The first standard Big Data Benchmark for Hadoop,
based on the TeraSort workload.
Benchmarks used
8. http://www.bigdata.uni-frankfurt.de/
8
1. Big Data Management Technologies (cnt)
NoSQL
Evaluated the Cassandra / DataStax Enterprise (9 Nodes
Cassandra Cluster) with HiBench and Yahoo Cloud Service
Benchmarks.
Hadoop Ecosystem
• Evaluated the performance of different virtualized Hadoop cluster
configurations on top of VMware vSphere using the Big Data
Extension (Project Serengeti).
• Benchmarking the Cloudera Hadoop Distribution 5.2 (4 Nodes
Hadoop Cluster) with the TPCx-HS benchmarks.
• Experimenting with the BigBench benchmark using Hive and Spark
SQL.
In-Memory Databases
Evaluation of a Big Data Architecture based on SAP HANA and
Cloudera Hadoop for different use cases and analytical
workloads
Platforms used
9. http://www.bigdata.uni-frankfurt.de/
9
1. Big Data Management Technologies (cnt)
Relevant Publications
• Performance Evaluation of Enterprise Big Data Platforms with HiBench (In 9th IEEE International
Conference on Big Data Science and Engineering (IEEE BigDataSE 2015), August 20-22, Helsinki, Finland)
• Benchmarking the Availability and Fault Tolerance of Cassandra (In 6th Workshop on Big Data
Benchmarking (6th WBDB), June 16-17, 2015, Toronto, Canada)
• Performance Evaluation of Spark SQL using BigBench (In 6th Workshop on Big Data Benchmarking (6th
WBDB), June 16-17, 2015, Toronto, Canada)
• Benchmarking DataStax Enterprise/Cassandra with HiBench (Technical Report No. 2014-2 )
• Performance Evaluation of Virtualized Hadoop Clusters (Technical Report No. 2014-1)
• Benchmarking Virtualized Hadoop Clusters (In proceedings of the Big Data Benchmarking - 5th International
Workshop, WBDB 2014, Potsdam,Germany, August 5-6, 2014, Revised Selected Papers)
Full list of publications is available online: http://www.bigdata.uni-frankfurt.de/publications/
10. http://www.bigdata.uni-frankfurt.de/
1. Big Data Management Technologies (cnt)
Member of the Standard
Performance
Evaluation Corporation
(SPEC)
SPEC is a non-profit corporation formed to
establish, maintain and endorse a
standardized set of relevant benchmarks that
can be applied to the newest generation of
high-performance computers.
The RG Big Data Working Group is a
forum for individuals and organizations
interested in the big data benchmarking
topic.
List of all 52 Member Organizations
Advanced Strategic Technology LLC
ARM
bankmark UG
Barcelona Supercomputing Center
Charles University
Cisco Systems
Cloudera, Inc
Compilaflows
Delft University of Technology
Dell
fortiss GmbH
Friedrich-Alexander-University Erlangen-Nuremberg
Goethe University Frankfurt
Hewlett-Packard
Huawei
IBM
Imperial College London
Indian Institute of Technology, Bombay
Institute for Information Industry, Taiwan
Institute of Communication and Computer Systems/NTUA
Intel
Karlsruhe Institute of Technology
Kiel University
Microsoft
MIOsoft Corporation
NICTA
NovaTec GmbH
Oracle
Purdue University
Red Hat
RWTH Aachen University
Salesforce.com
San Diego Supercomputing Center
San Francisco State University
SAP AG
Siemens Corporation
Technische Universität Darmstadt
The MITRE Corporation
Umea University
University of Alberta
University of Coimbra
University of Florence
University of Lugano
University of Minnesota
University of North Florida
University of Paderborn
VMware
University of Wuerzburg
University of Texas at Austin
University of Stuttgart *
University of Pavia
11. http://www.bigdata.uni-frankfurt.de/
Benchmarking (Berlin SPARQL Benchmark - BSBM):
Linked Open Data are structured data that are published online in order to be accessed automatically by computers. By
combining different sources huge amount of similar or related structured data are brought together in order to be queried and
analyzed.
This research area is closely related to the Semantic Web and its standards stack like RDF* and OWL. We are interested in analyzing
and benchmarking existing storage solutions and to apply the idea of LOD to selected applications.
Our current activities are:
• AFE-Web – Cooperation project with Römisch Germanischen Kommission (RGK) Antike Fundmünzen in Europa: database (AFE-
WEB) is a web-based database for recording and publishing coin finds
• European Coin Find Network
• Nomisma.org (Karsten Tolle being member of the steering committee)
11
2. Graph Databases/Linked Open Data(LOD)
12. http://www.bigdata.uni-frankfurt.de/
• Twenty students from UC Berkeley, Stanford University and Goether University Frankfurt
committed participation to the challenges of obesity, heart/lung failure and mood disorder.
• Frankfurt Big Data Lab uses of data acquisition and data blending to improve the quality of analytics.
12
3. Data analytics/Data science
# of patients
in a postal area
<visualization of the given patients’ data>
<retrieved Twitter data by
a keyword of obesity>
<retrieved Twitter data from
a state of Pennsylvania>
13. http://www.bigdata.uni-frankfurt.de/
13
4. Big Data for Social Good
What can be done in the international research community to make sure that some
of the most brilliant big data use cases do have an impact also for social issues ?
Our motivation is to encourage the
international research community to
work on Big Data problems that have
a potential positive social impact for
mankind
World map of
scientific
collaborations,
2014
14. http://www.bigdata.uni-frankfurt.de/
14
4. Big Data for Social Good Projects
The DATA REFUGEES PROJECT
1.1 million refugees and migrants registered in Germany in 2015
Number of refugees to arrive in Frankfurt increases from 170 to 250 per week
We will explore the question if and how data can be used to create:
— A Data Products to help the inclusion in the city of Frankfurt
— Insights that can be escalated to the decision makers in the city of Frankfurt.
15. http://www.bigdata.uni-frankfurt.de/
15
4. Big Data for Social Good Projects
The DATA REFUGEES PROJECT - METODOLOGY
We will gather data from various sources available in Frankfurt. The challenge is that the flow of
information is not, by nature, well organized.
Data integration
Data fusion
Data blending
Design Thinking
Techniques
Collect data from multiple sources, including changes of format and cleanup of redundant or useless
entries. The outcome is a standardized, unified table.
Integrate imperfect data sources overlapping over a small group of objects.
Allow sources to be imperfect, incomplete, and overlapping over a few objects or none at all, requiring
inspired guesses and generalizations
Create and evaluate new ideas through a human centric approach for problem solving.
16. http://www.bigdata.uni-frankfurt.de/
16
4. Big Data for Social Good Projects
DATA
UNDERSTANDING
INCLUSION
PROCESS
UNDERSTANDING
DATA
PREPARATION
DATA
INTEGRATION
MODELLINGEVALUATIONDEPLOYMENT
DATA
PRODUCT
OR
KNOWLEDGE
DELIVERY
We are here!
No too much data
available!
We aim to demonstrate that is feasible by using
available data to help, support and possibly guide the
process of inclusion for refugees in the city of Frankfurt.
17. http://www.bigdata.uni-frankfurt.de/
17
4. Big Data for Social Good Projects
• LOOKING FOR DATA!
• Define a open source tool and methodology for managing the volunteers and activities
and propose it to the AWO organization. THIS DATA IS NOT COLLECTED
http://lale.help https://volunteer-planner.org
• Retrieve the twitter data about refugees in Frankfurt.
• Coaching the two refugees to develop a mobile app.
• We aim in particular to help refugee children. We hope to be able to help them in the
inclusion process in our society.
18. http://www.bigdata.uni-frankfurt.de/
18
4. Big Data for Social Good Projects
The DATA REFUGEES PROJECT NEEDS
We encourage developers to contact us if you wish to contribute to
this project!
Contact Person: Concha Sanchez-Ocaña, Project Manager. concha@dbis.cs.uni-frankfurt.de
Organizations involved
Frankfurt Big Data Lab, Goethe University Frankfurt.
School of Business, University of Applied Sciences Mainz
Research Center SAFE (funded by the State of Hessen initiative for research, LOEWE)
Betriebliche Kommunikationssysteme und IT-Security, University of Applied Sciences Offenburg
THANK YOU!!
Editor's Notes
We are interested to benchmark new software platforms for storing and processing massive amounts of data and for analytics beyond what conventional relational systems can do.
We are interested to test such systems against domain specific workloads to perform data clustering, predictive modeling, and complex statistics. In addition, we are investigating graph-based DBMSes for social-network-style analysis.