Scigility!
Scire propter agere!

Big Data and Data Science for
traditional Swiss companies!
Dr. Daniel Fasel!
Scigility AG !
www.scigility.com!
Who are we?!
–  Dr. Daniel Fasel has been the first data scientist on the
business intelligence team at Swisscom and was key in
implementing NoSQL technologies for explorative analytics
during his time at Swisscom!
!
–  Prof. Dr. Philippe Cudré-Mauroux is the director of the
eXascale Infolab and Professor at the University of Fribourg
in Switzerland. Previously, he was a postdoctoral associate
working in the Database Systems group at MIT. He also
worked on distributed information and media management for
HP, IBM Watson Research (NY), and Microsoft Research
Asia.!

www.scigility.com	
  

2!
Our services!
•  We provide consulting, software
development, operations and training in the
areas of large-scale information systems,
NoSQL technologies and real time streaming
solutions. Technologies commonly known as
Big Data.!
•  We provide the optimal solution to your
specific IT problems, based on the latest and
most appropriate technologies available!!
www.scigility.com	
  

3!
Content!
•  Short overview of Big Data!
•  Techniques & Technologies!
•  3 Demonstrations how to use these new
Techniques and Technologies!

www.scigility.com	
  

4!
Big Data!
•  Big – Volume!
–  Big is relative!
•  Google > 400PB !
•  Traditional Swiss companies < 400 PB ;-)!

–  But Big Data (Volume) is a fact!
•  Data constantly grow!
•  More and more systems produce data!
•  Classical relational schemas already do a pre-selection
of data!
•  There is more interesting data in your company than
you potentially assume (dark data on Intranet or File
Shares, application silos, etc.)!
www.scigility.com	
  

5!
Big Data!
•  Variety!
–  Multi-structured data!

www.scigility.com	
  

6!
Big Data!
•  Variety!
–  Structures change over time!
•  Think about your legacy system!

–  The structure of data is determined at the time
on analysis and not at the time of storing!
•  Schema on purpose / schema on read!

–  Combination of classical and new data
sources!

7!
Big Data!
•  Velocity!
–  Data is produced faster!
–  Data becomes more and more ephemeral!
–  Analytics gets real time!
–  Data flows are as interesting as data itself!

www.scigility.com	
  

8!
Big Data!
•  What is the innovation of Big Data?!
•  The real new innovations of Big Data are!
–  optimized techniques and technologies!
–  that address the specific characteristics which
are commonly summarized as Big Data!

•  Big Data is not a new kind of data!!
–  You all have Big Data! !

www.scigility.com	
  

9!
Techniques!
•  New techniques!
–  MapReduce for analysis of highly distributed
data!
–  Combination of linear algebra, statistics,
computer science, visualization for broader
users groups!
–  Improved agility and rapid prototyping!
–  Explorative analytics!

www.scigility.com	
  

10!
Technologies!
•  New technologies!
–  Massive horizontal scalable / elastic!
–  Optimized for specific types of problems!
–  Not necessarily ACID compliant!
–  Follow BASE & CAP concepts!
–  Integration and combination of classical and
new technologies (like SQL-MR)!

www.scigility.com	
  

11!
Demonstration 1!
•  Indexing and search on multi-structured
data with Autonomy!

www.scigility.com	
  

12!
Demonstration 2!
•  Real Time Streaming with Storm!
STORM
Spout
Twitter

Redis

Node.js /
Socket.io

Bolt

www.scigility.com	
  

13!
Demonstration 3!
•  Collaborative filtering, path analysis and
visualization with AsterData!

www.scigility.com	
  

14!
Thank you!!
•  Questions?!
•  You can contact us:!
–  Tel.: +41 79 202 47 89!
–  df@scigility.com!
–  www.scigility.com!

!

www.scigility.com	
  

15!

Big Data and Data Science for traditional Swiss companies

  • 1.
    Scigility! Scire propter agere! BigData and Data Science for traditional Swiss companies! Dr. Daniel Fasel! Scigility AG ! www.scigility.com!
  • 2.
    Who are we?! – Dr. Daniel Fasel has been the first data scientist on the business intelligence team at Swisscom and was key in implementing NoSQL technologies for explorative analytics during his time at Swisscom! ! –  Prof. Dr. Philippe Cudré-Mauroux is the director of the eXascale Infolab and Professor at the University of Fribourg in Switzerland. Previously, he was a postdoctoral associate working in the Database Systems group at MIT. He also worked on distributed information and media management for HP, IBM Watson Research (NY), and Microsoft Research Asia.! www.scigility.com   2!
  • 3.
    Our services! •  Weprovide consulting, software development, operations and training in the areas of large-scale information systems, NoSQL technologies and real time streaming solutions. Technologies commonly known as Big Data.! •  We provide the optimal solution to your specific IT problems, based on the latest and most appropriate technologies available!! www.scigility.com   3!
  • 4.
    Content! •  Short overviewof Big Data! •  Techniques & Technologies! •  3 Demonstrations how to use these new Techniques and Technologies! www.scigility.com   4!
  • 5.
    Big Data! •  Big– Volume! –  Big is relative! •  Google > 400PB ! •  Traditional Swiss companies < 400 PB ;-)! –  But Big Data (Volume) is a fact! •  Data constantly grow! •  More and more systems produce data! •  Classical relational schemas already do a pre-selection of data! •  There is more interesting data in your company than you potentially assume (dark data on Intranet or File Shares, application silos, etc.)! www.scigility.com   5!
  • 6.
    Big Data! •  Variety! – Multi-structured data! www.scigility.com   6!
  • 7.
    Big Data! •  Variety! – Structures change over time! •  Think about your legacy system! –  The structure of data is determined at the time on analysis and not at the time of storing! •  Schema on purpose / schema on read! –  Combination of classical and new data sources! 7!
  • 8.
    Big Data! •  Velocity! – Data is produced faster! –  Data becomes more and more ephemeral! –  Analytics gets real time! –  Data flows are as interesting as data itself! www.scigility.com   8!
  • 9.
    Big Data! •  Whatis the innovation of Big Data?! •  The real new innovations of Big Data are! –  optimized techniques and technologies! –  that address the specific characteristics which are commonly summarized as Big Data! •  Big Data is not a new kind of data!! –  You all have Big Data! ! www.scigility.com   9!
  • 10.
    Techniques! •  New techniques! – MapReduce for analysis of highly distributed data! –  Combination of linear algebra, statistics, computer science, visualization for broader users groups! –  Improved agility and rapid prototyping! –  Explorative analytics! www.scigility.com   10!
  • 11.
    Technologies! •  New technologies! – Massive horizontal scalable / elastic! –  Optimized for specific types of problems! –  Not necessarily ACID compliant! –  Follow BASE & CAP concepts! –  Integration and combination of classical and new technologies (like SQL-MR)! www.scigility.com   11!
  • 12.
    Demonstration 1! •  Indexingand search on multi-structured data with Autonomy! www.scigility.com   12!
  • 13.
    Demonstration 2! •  RealTime Streaming with Storm! STORM Spout Twitter Redis Node.js / Socket.io Bolt www.scigility.com   13!
  • 14.
    Demonstration 3! •  Collaborativefiltering, path analysis and visualization with AsterData! www.scigility.com   14!
  • 15.
    Thank you!! •  Questions?! • You can contact us:! –  Tel.: +41 79 202 47 89! –  df@scigility.com! –  www.scigility.com! ! www.scigility.com   15!