Mapping presentation THAG big data from space

Delegation /
Organisation
Logo
Outsourcing Partner Big Data from the Space | 21st of February 2017 | Slide 1
Big Data From the Space
2017 Cycle 1st Mapping Meetings
Outsourcing Partner Sp. z o.o.
Bartosz Szkudlarek
Piotr Zaborowski

We are Outsourcing Partner, a technology
company, specialized in custom software
development and Big Data.
Outsourcing Partner capabilities on Big Data

What can we bring?
Proven technology experience with common
Big Data technologies.

Our experience
Six projects in Big Data domain, which use
Hadoop, Apache Spark and other
technologies. Two projects for ESA where
the point was to integrated and visualize
massive data.

Project name Project subject Technologies Numbers
European Space Agency
GEOSS Web Portal
Data hub portal with search functionality.
Objective of this project was to integrate
two different data sources on one
visualisation platform
HTML5, maps, microservices More than 1 mln resuls
Two different data sources.
European Space Agency
The EO Web – the new
website
Proof of concept for new content
architecture of new Earth Observation
website which collects all information
from domain services.
The primary purpose of this project to
identify and unify content elements from
all EO websites and to provide efficient
mechanism for harvesting, indexing,
categorising and searching content.
HTML5, Elastic Search, Kibana, Google
Analytics
More than 50 websites with
technical documentation about
missions instruments and other
information connected with the
area, over the 500k resources
identified.
Operational, constant dev Proof of concept Operational, complete

Telecommunication sector
T-Mobile
Messaging broker
Communication exchange between
operator and customer is crucial. We
implement communication broker for
text messages (SMS, push notifications,
etc..) which allows to monitor:
• message efficiencies (how many
reminders are needed for force user
to pay delayed payments, what
message force user to buy additional
internet limit),
• message rules ( the system can not
send information about available
internet package if user order
package though any channel).
Casandra, Apache Hadoop The system handled 15 mln
customers, 3 mln message per
day.
Telecommunication sector
T-Mobile
Customer self-service system
To provide services for customers, the
telecommunication company needs to
have many backend systems to support
operations.
The aim of this project was to implement
the mechanism for collecting information
about user activities in one repository.
Except massive amount of data the
challenge was to unify information from
many domains systems.
ELC stack (Elastic Search, Kibana,
Logstash)

Betterware
Retail company
Sale support prediction
mechanism
Together with Betterware, we analyzed
the sales data and singled the sets of
products which are frequently bought by
consumers.
Apache Sparx,
Apache Hadoop,
Tableau Software
8 500 customers, 1 k orders
dally, machine learning
algorithms train on 1 mln
operations (5 years of history
data).
Insurance company
Integration of customer
databases
The aim of the project was to integrate
data about customers and their
operations stored and managed by four
different domain systems. The scope of
the project contains:- data analysis and
providing integrated domain model, -
ETL transformations programming, -
visualization of data based on Tableau
Software
Tableau Software,
Amazon AWS
4 domain system, more than
30 unified domain objects.

Electoral Committee Candidate
for President of the Republic
Media monitoring
During the presidential election in 2015
in Poland we monitored social media
(Facebook, Twitter, Youtube) and digital
newspapers.
From data fetched from social media we
prepared reports of popularity of
particular candidates, sentiment of
comments connected with candidates
and leaders of communities (blog
authors, influencers), we built algorithm
estimates trending phrases for political
domain.
Apache Hadoop,
Apache Spark,
HTML5 reports

Comments on Big Data from Space (OSP)
• Security and legal recommendations should be defined if applicable
• 4.4 Services and data location with legal consequences policy is not referenced.
Harmonisation should clarify strategy and policy towards data localisation and
promoted licensing models technologies.
• Services reliability
• 4.5.4.6 suitable services reliability or reproducibility for industrial development.
Availability model should be applied (like in the Ground Segment) for platforms
exposed to crowdsource/industry to secure its business models
• Openness to other data sources
• 4.5.4.1 Some proven decision support solutions base on combining satellite data and
other data sinks, thus architecture supporting data integration should be considered.

Comments on Big Data from Space (OSP)
• Consider exchangeability aspect
• 4.5.X.1 Interoperability and exchangeability can be one of the strategy dimension in
cross domain data flow.
• Consider architectural influence of data organisational spread on usability (technical)
• 4.5.2.1 For data organisation (like CDM) shredding policy should be aligned to current
and potential requirements. Solution should enable generic interfaces be build in
awareness of underlying data distribution while not infrastructure.
• Openness vs predictability on provided platforms
• 4.5.3.1 orchestration and prioritisation: in shared environment extensive experiments
may coexists with operational periodic/stream analytics that should not be
depredated.

OSP suggestions for Big Data from Space Roadmap
Apart from precise needs and solutions mapping we suggest consideration of
following.
• Standardisation advisory body constituted for new/ongoing initiatives would
enable natural alignment to process and consider new approaches.
• Services and technologies catalogue of state of the art, recommended and
applying setup for members and industry review.
• Layered architecture of systems should be proposed and adopted with common
interfaces to enable interoperability, relocations, third party added value services
development - with respect of blurred borders and dependencies.
• Federalisation tactics should be consolidated.
• Industry-related, legal and security policies and strategies should be defined.

Conclusions on Big Data from Space from OSP
The most valuable Big Data projects came from
interdisciplinary teams which can juggle data from many
different data sources

Conclusions on Big Data from Space (OSP)
Data Scientists are mostly
mathematicians and physics.
Significant part of them start
experiments from sample
databases such us IRIS or Lena.
Why can't they use the Agency
resources?

Conclusions (OSP)
As SME with long SW and big data domain we recognise following challenges in
unlocking data potential according to 5.2 European Strategic Interests:
• High entry threshold - data is closed for non-domain industry companies and
research units.
• Current ESA big data exploitation projects are silo – there is no collaboration and
competition, no place for processing workflow,
• There is (possibly) evaluation gap – resources managed by the Agency are
valuable but unevaluated, there are no (not many) mechanism for collecting
community feedback and evolve,
• Great data and services are of undefined reliability and partly unpredictible

Conclusions (OSP)
Useful tools to deal with pitfalls of Big Data exploitation:
• Focusing on the potential customers the Agency should put an effort promoting
and exposing the value of the data,
• Data platform should be as open & simple as possible – the Open Data principle,
• Implement mechanisms of collaboration; define subsets, rate&evaluate, share:
ideas, experiments, results, extend, finally create processing chain,
• Deliver reliable services meeting industry needs or enable commercial
federalisation/transition to business of value added services

Mapping presentation THAG big data from space

More Related Content

What's hot

Viewers also liked

Similar to Mapping presentation THAG big data from space

Recently uploaded

Mapping presentation THAG big data from space