Cwin16 - Paris- m rapid

205 views

Published on

Cwin16 - Paris- m rapid

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
205
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Cwin16 - Paris- m rapid

  1. 1. For internal use only Move to industrialized Big data : mRapid ingestion framework Paris, 26/09/2016, Edmond SEGALEN
  2. 2. Copyright © 2016 Capgemini and Sogeti – Internal use only. All rights reserved. 2 Industrialisation des projets Big data : mRapid | 26 Septembre 2016 Table of Contents  Why mRapid?  What mRapid is doing?  Reference architecture  How to: process flows  Roadmap  Contact information
  3. 3. Copyright © 2016 Capgemini and Sogeti – Internal use only. All rights reserved. 3 Industrialisation des projets Big data : mRapid | 26 Septembre 2016 Why mRapid? You have hundreds of databases, a mainframe, thousands of files (CSV, flat files, JSON, XML, PDF…) to ingest to data lake? For accelerating such volumes of internal or external data ingestion: Capgemini created an solution named: mRapid.
  4. 4. Copyright © 2016 Capgemini and Sogeti – Internal use only. All rights reserved. 4 Industrialisation des projets Big data : mRapid | 26 Septembre 2016 Table of Contents  Why mRapid?  What mRapid is doing?  Reference architecture  How to: process flows  Roadmap  Contact information
  5. 5. Copyright © 2016 Capgemini and Sogeti – Internal use only. All rights reserved. 5 Industrialisation des projets Big data : mRapid | 26 Septembre 2016 What mRapid is doing?  It’s Capgemini metadata driven ingestion framework for data lake,  Leverages Capgemini’s in-house accelerators as well as Hortonworks Data Flow (HDF)/ Apache NiFi for various ingest patterns such as:  JSON to AVRO,  XML to AVRO,  RDBMS to AVRO,  Kafka/JMS ingest,  Web services ingest,  compliance with Rest API. Benefits of mRapid:  Storage options are NOT limited to Hive, but can be extended to  provide option for appropriate big data storage technology, such as HDFS, NoSQL in addition to Hive,  leverage efficient storage formats like Avro, ORC and Parquet,  leverage compression codec like Snappy, LZMA.  Lower time to market and faster on-boarding of new source systems  Better control on the SLA parameters (expected duration, due dates),  Supports migration from existing workloads as well as existing warehouses and analytics platforms. Common and streamlined ingestion utility for various ingestion patterns,  Reconciliation and exception alerting.
  6. 6. Copyright © 2016 Capgemini and Sogeti – Internal use only. All rights reserved. 6 Industrialisation des projets Big data : mRapid | 26 Septembre 2016 Table of Contents  Why mRapid?  What mRapid is doing?  Reference architecture  How to: process flows  Roadmap  Contact information
  7. 7. Copyright © 2016 Capgemini and Sogeti – Internal use only. All rights reserved. 7 Industrialisation des projets Big data : mRapid | 26 Septembre 2016 Reference architecture
  8. 8. Copyright © 2016 Capgemini and Sogeti – Internal use only. All rights reserved. 8 Industrialisation des projets Big data : mRapid | 26 Septembre 2016 Table of Contents  Why mRapid?  What mRapid is doing?  Reference architecture  How to: process flows  Roadmap  Contact information
  9. 9. Copyright © 2016 Capgemini and Sogeti – Internal use only. All rights reserved. 9 Industrialisation des projets Big data : mRapid | 26 Septembre 2016 Data ingestion modules process flow
  10. 10. Copyright © 2016 Capgemini and Sogeti – Internal use only. All rights reserved. 10 Industrialisation des projets Big data : mRapid | 26 Septembre 2016 Operations metadata process flow
  11. 11. Copyright © 2016 Capgemini and Sogeti – Internal use only. All rights reserved. 11 Industrialisation des projets Big data : mRapid | 26 Septembre 2016 Source data structure change management process flow
  12. 12. Copyright © 2016 Capgemini and Sogeti – Internal use only. All rights reserved. 12 Industrialisation des projets Big data : mRapid | 26 Septembre 2016 Table of Contents  Why mRapid?  What mRapid is doing?  Reference architecture  How to: process flows  Roadmap  Contact information
  13. 13. Copyright © 2016 Capgemini and Sogeti – Internal use only. All rights reserved. 13 Industrialisation des projets Big data : mRapid | 26 Septembre 2016 mRapid roadmap What have we already done? Ingestion Archetypes comprehended at the moment  Mainframe EBCDIC (Fixed width) to HDFS/Hive  SAS Dataset to HDFS/Hive  Delimited or Fixed Width file to HDFS/Hive  Integration with any industry standard CDC tool  Seamless integration with Hadoop platforms  As-Is transfer from native to Hadoop  JSON  XML  Weblogs  Basic Integration with HDF/Apache NiFi  Enables the creation of 100s of ingestion jobs programmatically  Exposing mRapid as a web service What are we building now?  Enhanced Audit Logging and Operations Metadata  Real-time source integration  Integration with authentication, authorization tools  Apache Atlas integration  Advanced Nifi flow and orchestration with HDF 2.0  Improved GUI of MetaApp  Column Mapping enhancement Data Steward mRAPID Job creation service SOAP XML Message Command Centre / External App mRAPID Job execution service SOAP XML Message
  14. 14. Copyright © 2016 Capgemini and Sogeti – Internal use only. All rights reserved. 14 Industrialisation des projets Big data : mRapid | 26 Septembre 2016 Table of Contents  Why mRapid?  What mRapid is doing?  Reference architecture  How to: process flows  Roadmap  Contact information
  15. 15. Copyright © 2016 Capgemini and Sogeti – Internal use only. All rights reserved. 15 Industrialisation des projets Big data : mRapid | 26 Septembre 2016 Contact information Manuel Sevilla I&D Global Global Head of Big Data, Analytics and MDM manuel.sevilla@capgemini.com Insert contact picture Insert contact picture Insert contact picture Insert contact picture Anne-Laure Thieullent I&D Global Big Data Europe Director annelaure.thieullent@capgemini.com Sunil Patil I&D India Manager sunil.p.patil@capgemini.com Edmond SEGALEN I&D France Chief Architect edmond.segalen@capgemini.com
  16. 16. www.capgemini.com The information contained in this presentation is proprietary and confidential. It is for Capgemini and Sogeti internal use only. Copyright © 2016 Capgemini and Sogeti. All rights reserved. Rightshore® is a trademark belonging to Capgemini. No part of this document may be modified, deleted or expanded by any process or means without prior written permission from Capgemini. www.sogeti.com About Capgemini and Sogeti With more than 180,000 people in over 40 countries, Capgemini is a global leader in consulting, technology and outsourcing services. The Group reported 2015 global revenues of EUR 11.9 billion. Together with its clients, Capgemini creates and delivers business, technology and digital solutions that fit their needs, enabling them to achieve innovation and competitiveness. A deeply multicultural organization, Capgemini has developed its own way of working, the Collaborative Business Experience™, and draws on Rightshore®, its worldwide delivery model. Sogeti is a leading provider of technology and software testing, specializing in Application, Infrastructure and Engineering Services. Sogeti offers cutting-edge solutions around Testing, Business Intelligence & Analytics, Mobile, Cloud and Cyber Security. Sogeti brings together more than 23,000 professionals in 15 countries and has a strong local presence in over 100 locations in Europe, USA and India. Sogeti is a wholly-owned subsidiary of Cap Gemini S.A., listed on the Paris Stock Exchange.

×