For internal use only
Move to industrialized Big data :
mRapid ingestion framework
Paris, 26/09/2016, Edmond
SEGALEN
Copyright © 2016 Capgemini and Sogeti – Internal use only. All rights reserved. 2
Industrialisation des projets Big data : mRapid | 26 Septembre 2016
Table of Contents
 Why mRapid?
 What mRapid is doing?
 Reference architecture
 How to: process flows
 Roadmap
 Contact information
Copyright © 2016 Capgemini and Sogeti – Internal use only. All rights reserved. 3
Industrialisation des projets Big data : mRapid | 26 Septembre 2016
Why mRapid?
You have hundreds of
databases, a
mainframe, thousands
of files (CSV, flat files,
JSON, XML, PDF…)
to ingest to data lake?
For accelerating such
volumes of internal or
external data
ingestion: Capgemini
created an solution
named: mRapid.
Copyright © 2016 Capgemini and Sogeti – Internal use only. All rights reserved. 4
Industrialisation des projets Big data : mRapid | 26 Septembre 2016
Table of Contents
 Why mRapid?
 What mRapid is doing?
 Reference architecture
 How to: process flows
 Roadmap
 Contact information
Copyright © 2016 Capgemini and Sogeti – Internal use only. All rights reserved. 5
Industrialisation des projets Big data : mRapid | 26 Septembre 2016
What mRapid is doing?
 It’s Capgemini metadata driven ingestion framework for data lake,
 Leverages Capgemini’s in-house accelerators as well as Hortonworks Data Flow (HDF)/
Apache NiFi for various ingest patterns such as:
 JSON to AVRO,
 XML to AVRO,
 RDBMS to AVRO,
 Kafka/JMS ingest,
 Web services ingest,
 compliance with Rest API.
Benefits of mRapid:
 Storage options are NOT limited to Hive, but can be extended to
 provide option for appropriate big data storage technology, such as HDFS, NoSQL in
addition to Hive,
 leverage efficient storage formats like Avro, ORC and Parquet,
 leverage compression codec like Snappy, LZMA.
 Lower time to market and faster on-boarding of new source systems
 Better control on the SLA parameters (expected duration, due dates),
 Supports migration from existing workloads as well as existing warehouses and
analytics platforms. Common and streamlined ingestion utility for various ingestion
patterns,
 Reconciliation and exception alerting.
Copyright © 2016 Capgemini and Sogeti – Internal use only. All rights reserved. 6
Industrialisation des projets Big data : mRapid | 26 Septembre 2016
Table of Contents
 Why mRapid?
 What mRapid is doing?
 Reference architecture
 How to: process flows
 Roadmap
 Contact information
Copyright © 2016 Capgemini and Sogeti – Internal use only. All rights reserved. 7
Industrialisation des projets Big data : mRapid | 26 Septembre 2016
Reference architecture
Copyright © 2016 Capgemini and Sogeti – Internal use only. All rights reserved. 8
Industrialisation des projets Big data : mRapid | 26 Septembre 2016
Table of Contents
 Why mRapid?
 What mRapid is doing?
 Reference architecture
 How to: process flows
 Roadmap
 Contact information
Copyright © 2016 Capgemini and Sogeti – Internal use only. All rights reserved. 9
Industrialisation des projets Big data : mRapid | 26 Septembre 2016
Data ingestion modules process flow
Copyright © 2016 Capgemini and Sogeti – Internal use only. All rights reserved. 10
Industrialisation des projets Big data : mRapid | 26 Septembre 2016
Operations metadata process flow
Copyright © 2016 Capgemini and Sogeti – Internal use only. All rights reserved. 11
Industrialisation des projets Big data : mRapid | 26 Septembre 2016
Source data structure change management process flow
Copyright © 2016 Capgemini and Sogeti – Internal use only. All rights reserved. 12
Industrialisation des projets Big data : mRapid | 26 Septembre 2016
Table of Contents
 Why mRapid?
 What mRapid is doing?
 Reference architecture
 How to: process flows
 Roadmap
 Contact information
Copyright © 2016 Capgemini and Sogeti – Internal use only. All rights reserved. 13
Industrialisation des projets Big data : mRapid | 26 Septembre 2016
mRapid roadmap
What have we already done?
Ingestion Archetypes comprehended at the moment
 Mainframe EBCDIC (Fixed width) to HDFS/Hive
 SAS Dataset to HDFS/Hive
 Delimited or Fixed Width file to HDFS/Hive
 Integration with any industry standard CDC tool
 Seamless integration with Hadoop platforms
 As-Is transfer from native to Hadoop
 JSON
 XML
 Weblogs
 Basic Integration with HDF/Apache NiFi
 Enables the creation of 100s of ingestion jobs
programmatically
 Exposing mRapid as a web service
What are we building now?
 Enhanced Audit Logging and Operations Metadata
 Real-time source integration
 Integration with authentication, authorization tools
 Apache Atlas integration
 Advanced Nifi flow and orchestration with HDF 2.0
 Improved GUI of MetaApp
 Column Mapping enhancement
Data Steward mRAPID
Job creation service
SOAP XML Message
Command Centre /
External App
mRAPID
Job execution service
SOAP XML Message
Copyright © 2016 Capgemini and Sogeti – Internal use only. All rights reserved. 14
Industrialisation des projets Big data : mRapid | 26 Septembre 2016
Table of Contents
 Why mRapid?
 What mRapid is doing?
 Reference architecture
 How to: process flows
 Roadmap
 Contact information
Copyright © 2016 Capgemini and Sogeti – Internal use only. All rights reserved. 15
Industrialisation des projets Big data : mRapid | 26 Septembre 2016
Contact information
Manuel
Sevilla
I&D Global
Global Head of Big Data, Analytics and MDM
manuel.sevilla@capgemini.com
Insert
contact
picture
Insert
contact
picture
Insert
contact
picture
Insert
contact
picture
Anne-Laure
Thieullent
I&D Global
Big Data Europe Director
annelaure.thieullent@capgemini.com
Sunil
Patil
I&D India
Manager
sunil.p.patil@capgemini.com
Edmond
SEGALEN
I&D France
Chief Architect
edmond.segalen@capgemini.com
www.capgemini.com
The information contained in this presentation is proprietary and confidential.
It is for Capgemini and Sogeti internal use only. Copyright © 2016 Capgemini and Sogeti. All rights reserved.
Rightshore® is a trademark belonging to Capgemini.
No part of this document may be modified, deleted or expanded by any process or means without prior written permission from Capgemini.
www.sogeti.com
About Capgemini and Sogeti
With more than 180,000 people in over 40 countries, Capgemini is a
global leader in consulting, technology and outsourcing services. The
Group reported 2015 global revenues of EUR 11.9 billion. Together
with its clients, Capgemini creates and delivers business, technology
and digital solutions that fit their needs, enabling them to achieve
innovation and competitiveness. A deeply multicultural organization,
Capgemini has developed its own way of working, the Collaborative
Business Experience™, and draws on Rightshore®, its worldwide
delivery model.
Sogeti is a leading provider of technology and software testing,
specializing in Application, Infrastructure and Engineering
Services. Sogeti offers cutting-edge solutions around Testing,
Business Intelligence & Analytics, Mobile, Cloud and Cyber
Security. Sogeti brings together more than 23,000 professionals in
15 countries and has a strong local presence in over 100 locations
in Europe, USA and India. Sogeti is a wholly-owned subsidiary of
Cap Gemini S.A., listed on the Paris Stock Exchange.

Cwin16 - Paris- m rapid

  • 1.
    For internal useonly Move to industrialized Big data : mRapid ingestion framework Paris, 26/09/2016, Edmond SEGALEN
  • 2.
    Copyright © 2016Capgemini and Sogeti – Internal use only. All rights reserved. 2 Industrialisation des projets Big data : mRapid | 26 Septembre 2016 Table of Contents  Why mRapid?  What mRapid is doing?  Reference architecture  How to: process flows  Roadmap  Contact information
  • 3.
    Copyright © 2016Capgemini and Sogeti – Internal use only. All rights reserved. 3 Industrialisation des projets Big data : mRapid | 26 Septembre 2016 Why mRapid? You have hundreds of databases, a mainframe, thousands of files (CSV, flat files, JSON, XML, PDF…) to ingest to data lake? For accelerating such volumes of internal or external data ingestion: Capgemini created an solution named: mRapid.
  • 4.
    Copyright © 2016Capgemini and Sogeti – Internal use only. All rights reserved. 4 Industrialisation des projets Big data : mRapid | 26 Septembre 2016 Table of Contents  Why mRapid?  What mRapid is doing?  Reference architecture  How to: process flows  Roadmap  Contact information
  • 5.
    Copyright © 2016Capgemini and Sogeti – Internal use only. All rights reserved. 5 Industrialisation des projets Big data : mRapid | 26 Septembre 2016 What mRapid is doing?  It’s Capgemini metadata driven ingestion framework for data lake,  Leverages Capgemini’s in-house accelerators as well as Hortonworks Data Flow (HDF)/ Apache NiFi for various ingest patterns such as:  JSON to AVRO,  XML to AVRO,  RDBMS to AVRO,  Kafka/JMS ingest,  Web services ingest,  compliance with Rest API. Benefits of mRapid:  Storage options are NOT limited to Hive, but can be extended to  provide option for appropriate big data storage technology, such as HDFS, NoSQL in addition to Hive,  leverage efficient storage formats like Avro, ORC and Parquet,  leverage compression codec like Snappy, LZMA.  Lower time to market and faster on-boarding of new source systems  Better control on the SLA parameters (expected duration, due dates),  Supports migration from existing workloads as well as existing warehouses and analytics platforms. Common and streamlined ingestion utility for various ingestion patterns,  Reconciliation and exception alerting.
  • 6.
    Copyright © 2016Capgemini and Sogeti – Internal use only. All rights reserved. 6 Industrialisation des projets Big data : mRapid | 26 Septembre 2016 Table of Contents  Why mRapid?  What mRapid is doing?  Reference architecture  How to: process flows  Roadmap  Contact information
  • 7.
    Copyright © 2016Capgemini and Sogeti – Internal use only. All rights reserved. 7 Industrialisation des projets Big data : mRapid | 26 Septembre 2016 Reference architecture
  • 8.
    Copyright © 2016Capgemini and Sogeti – Internal use only. All rights reserved. 8 Industrialisation des projets Big data : mRapid | 26 Septembre 2016 Table of Contents  Why mRapid?  What mRapid is doing?  Reference architecture  How to: process flows  Roadmap  Contact information
  • 9.
    Copyright © 2016Capgemini and Sogeti – Internal use only. All rights reserved. 9 Industrialisation des projets Big data : mRapid | 26 Septembre 2016 Data ingestion modules process flow
  • 10.
    Copyright © 2016Capgemini and Sogeti – Internal use only. All rights reserved. 10 Industrialisation des projets Big data : mRapid | 26 Septembre 2016 Operations metadata process flow
  • 11.
    Copyright © 2016Capgemini and Sogeti – Internal use only. All rights reserved. 11 Industrialisation des projets Big data : mRapid | 26 Septembre 2016 Source data structure change management process flow
  • 12.
    Copyright © 2016Capgemini and Sogeti – Internal use only. All rights reserved. 12 Industrialisation des projets Big data : mRapid | 26 Septembre 2016 Table of Contents  Why mRapid?  What mRapid is doing?  Reference architecture  How to: process flows  Roadmap  Contact information
  • 13.
    Copyright © 2016Capgemini and Sogeti – Internal use only. All rights reserved. 13 Industrialisation des projets Big data : mRapid | 26 Septembre 2016 mRapid roadmap What have we already done? Ingestion Archetypes comprehended at the moment  Mainframe EBCDIC (Fixed width) to HDFS/Hive  SAS Dataset to HDFS/Hive  Delimited or Fixed Width file to HDFS/Hive  Integration with any industry standard CDC tool  Seamless integration with Hadoop platforms  As-Is transfer from native to Hadoop  JSON  XML  Weblogs  Basic Integration with HDF/Apache NiFi  Enables the creation of 100s of ingestion jobs programmatically  Exposing mRapid as a web service What are we building now?  Enhanced Audit Logging and Operations Metadata  Real-time source integration  Integration with authentication, authorization tools  Apache Atlas integration  Advanced Nifi flow and orchestration with HDF 2.0  Improved GUI of MetaApp  Column Mapping enhancement Data Steward mRAPID Job creation service SOAP XML Message Command Centre / External App mRAPID Job execution service SOAP XML Message
  • 14.
    Copyright © 2016Capgemini and Sogeti – Internal use only. All rights reserved. 14 Industrialisation des projets Big data : mRapid | 26 Septembre 2016 Table of Contents  Why mRapid?  What mRapid is doing?  Reference architecture  How to: process flows  Roadmap  Contact information
  • 15.
    Copyright © 2016Capgemini and Sogeti – Internal use only. All rights reserved. 15 Industrialisation des projets Big data : mRapid | 26 Septembre 2016 Contact information Manuel Sevilla I&D Global Global Head of Big Data, Analytics and MDM manuel.sevilla@capgemini.com Insert contact picture Insert contact picture Insert contact picture Insert contact picture Anne-Laure Thieullent I&D Global Big Data Europe Director annelaure.thieullent@capgemini.com Sunil Patil I&D India Manager sunil.p.patil@capgemini.com Edmond SEGALEN I&D France Chief Architect edmond.segalen@capgemini.com
  • 16.
    www.capgemini.com The information containedin this presentation is proprietary and confidential. It is for Capgemini and Sogeti internal use only. Copyright © 2016 Capgemini and Sogeti. All rights reserved. Rightshore® is a trademark belonging to Capgemini. No part of this document may be modified, deleted or expanded by any process or means without prior written permission from Capgemini. www.sogeti.com About Capgemini and Sogeti With more than 180,000 people in over 40 countries, Capgemini is a global leader in consulting, technology and outsourcing services. The Group reported 2015 global revenues of EUR 11.9 billion. Together with its clients, Capgemini creates and delivers business, technology and digital solutions that fit their needs, enabling them to achieve innovation and competitiveness. A deeply multicultural organization, Capgemini has developed its own way of working, the Collaborative Business Experience™, and draws on Rightshore®, its worldwide delivery model. Sogeti is a leading provider of technology and software testing, specializing in Application, Infrastructure and Engineering Services. Sogeti offers cutting-edge solutions around Testing, Business Intelligence & Analytics, Mobile, Cloud and Cyber Security. Sogeti brings together more than 23,000 professionals in 15 countries and has a strong local presence in over 100 locations in Europe, USA and India. Sogeti is a wholly-owned subsidiary of Cap Gemini S.A., listed on the Paris Stock Exchange.