Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

QCon 2018 | Gimel | PayPal's Analytic Platform

272 views

Published on

Site | https://www.infoq.com/qconai2018/

Youtube | https://www.youtube.com/watch?v=2h0biIli2F4&t=19s

At PayPal, data engineers, analysts and data scientists work with a variety of datasources (Messaging, NoSQL, RDBMS, Documents, TSDB), compute engines (Spark, Flink, Beam, Hive), languages (Scala, Python, SQL) and execution models (stream, batch, interactive).
Due to this complex matrix of technologies and thousands of datasets, engineers spend considerable time learning about different data sources, formats, programming models, APIs, optimizations, etc. which impacts time-to-market (TTM). To solve this problem and to make product development more effective, PayPal Data Platform developed "Gimel", a unified analytics data platform which provides access to any storage through a single unified data API and SQL, that are powered by a centralized data catalog.
In this session, we will introduce you to the various components of Gimel - Compute Platform, Data API, PCatalog, GSQL and Notebooks. We will provide a demo depicting how Gimel reduces TTM by helping our engineers write a single line of code to access any storage without knowing the complexity behind the scenes.

Published in: Technology
  • Login to see the comments

  • Be the first to like this

QCon 2018 | Gimel | PayPal's Analytic Platform

  1. 1. Gimel Data Platform Overview
  2. 2. Agenda ©2018 PayPal Inc. Confidential and proprietary. 2 • Introduction • PayPal & Big Data Space • Analytics Platform & Gimel • Why Gimel • Challenges in Analytics • Walk through simple use case • Gimel – Implementation Details • Gimel – Open Source & future • Q & A
  3. 3. About Us • Product manager, data processing products at PayPal • 20 years in data and analytics across networking, semi-conductors, telecom, security and fintech industries • Data warehouse developer, BI program manager, Data product manager romehta@paypal.com https://www.linkedin.com/in/romit-mehta/ ©2018 PayPal Inc. Confidential and proprietary. 3 Romit Mehta • Big data platform engineer at PayPal • 13 years in data engineering, 5 years in scalable solutions with big data • Developed several Spark-based solutions across NoSQL, Key-Value, Messaging, Document based & relational systems dmohanakumarchan@paypal.com https://www.linkedin.com/in/deepakmc/ Deepak Mohanakumar Chandramouli
  4. 4. PayPal – Key Metrics 4©2018 PayPal Inc. Confidential and proprietary.
  5. 5. PayPal Customers, Transactions and Growth 5 From: https://www.paypal.com/us/webapps/mpp/about
  6. 6. PayPal Big Data Platform 6 13 prod clusters, 12 non- prod clusters GPU co-located with Hadoop 150+ PB Data 40,000+ YARN jobs/day One of the largest Aerospike, Teradata, Hortonworks and Oracle installations Compute supported: MR, Pig, Hive, Spark, Beam
  7. 7. PayPal Analytics Ecosystem and Gimel Platform (Unified Data Processing Platform) 7©2018 PayPal Inc. Confidential and proprietary.
  8. 8. 8 Developer Data scientist Analyst Operator Gimel SDK Notebooks PCatalog Data API Infrastructure services leveraged for elasticity and redundancy Multi-DC Public cloudPredictive resource allocation Logging Monitoring Alerting Security Application Lifecycle Management Compute Frameworkand APIs GimelData Platform User Experience andAccess R Studio BI tools
  9. 9. Why Gimel? 9
  10. 10. Use case - Flights Cancelled
  11. 11. 11 Kafka Teradata External HDFS / Hive Data Prep / Availability ProcessStream Ingest LoadExtract/Load Parquet/ORC/Text? Productionalize, Logging, Monitoring, Alerting, Auditing, Data Quality Data SourcesData Points Flights Events Airports Airlines Carrier Geography & Geo Tags Analysis Publish Use case challenges … ©2018 PayPal Inc. Confidential and proprietary. Real-time/ processed data
  12. 12. ©2018 PayPal Inc. Confidential and proprietary. 12 Spark Read From Hbase Data Access Code is Cumbersome and Fragile
  13. 13. ©2018 PayPal Inc. Confidential and proprietary. 13 Spark Read From Hbase Spark Read From Elastic Search Spark Read From AeroSpike Spark Read From Druid Data Access Code is Cumbersome and Fragile
  14. 14. ©2018 PayPal Inc. Confidential and proprietary. 14 Datasets Challenges Data access tied to compute and data store versions Hard to find available data sets Storage-specific dataset creation results in duplication and increased latency No audit trail for dataset access No standards for on-boarding data sets for others to discover No statistics on data set usage and access trends Datasets
  15. 15. ©2018 PayPal Inc. Confidential and proprietary. 15 High-friction Data Application Lifecycle Learn Code Optimize Build Deploy RunOnboarding Big Data Apps Learn Code Optimize Build Deploy RunCompute Engine Changed Learn Code Optimize Build Deploy RunCompute Version Upgraded Learn Code Optimize Build Deploy RunStorage API Changed Learn Code Optimize Build Deploy RunStorage Connector Upgraded Learn Code Optimize Build Deploy RunStorage Hosts Migrated Learn Code Optimize Build Deploy RunStorage Changed Learn Code Optimize Build Deploy Run*********************
  16. 16. Gimel 16
  17. 17. Gimel | Flights Cancelled Search PCatalog 17
  18. 18. Sign in PCatalog portal and search for your datasets
  19. 19. Find your datasets
  20. 20. Gimel DataSet | Overview
  21. 21. Gimel DataSet | Schema Spec Relational DataSet Kafka DataSet
  22. 22. Gimel DataSet | System Spec Relational DataSet Kafka DataSet
  23. 23. Gimel DataSet | Object Spec Kafka DataSet Relational DataSet
  24. 24. Gimel DataSet | Availability
  25. 25. Find your datasets | Recap
  26. 26. Gimel | Flights Cancelled Analyze & Productionalize App 26
  27. 27. Access datasets: Navigate to Jupyter notebooks & analyze data
  28. 28. Setup the Application
  29. 29. Data API
  30. 30. Gimel | Flights App | Summary 30
  31. 31. 31 API, PCatalog, Tools With Gimel & Notebooks ©2018 PayPal Inc. Confidential and proprietary. Kafka Teradata External HDFS/ Hive Data Prep / Availability ProcessIngest LoadExtract/Load Parquet/ORC/Text? Productionalize, Logging, Monitoring, Alerting, Auditing, Data QC Data SourcesData Points Flights Events Airports Airlines Carrier Geography & Geo Tags Analysis Publish Use case challenges - Simplified with Gimel
  32. 32. ©2018 PayPal Inc. Confidential and proprietary. Spark Read From Hbase Spark Read From Elastic Search Spark Read From AeroSpike Spark Read From Druid With Data API ✔ Data Access Simplified with Gimel Data API 32
  33. 33. ©2018 PayPal Inc. Confidential and proprietary. Spark Read From Hbase Spark Read From Elastic Search Spark Read From AeroSpike Spark Read From Druid With Data API ✔ SQL Support in Gimel Data Platform 33
  34. 34. ©2018 PayPal Inc. Confidential and proprietary. 34 Data Application Lifecycle with Data API Learn Code Optimize Build Deploy RunOnboarding Big Data Apps RunCompute Engine Changed Compute Version Upgraded Storage API Changed Storage Connector Upgraded Storage Hosts Migrated Storage Changed ********************* Run Run Run Run Run Run
  35. 35. Gimel – Deep Dive 35
  36. 36. Job LIVY GRID Job Server Batch Livy API NAS Batch In InIn Interactive Sparkling Water Interactive Interactive Metrics History Server Thrift Server In InIn Interactive Interactive Log Log Indexing Search xDiscovery Maintain Catalog Scan Discover Metadata Services PCatalog UI Explore Configure Log Indexing Search PayPal Analytics Ecosystem ©2018 PayPal Inc. Confidential and proprietary.
  37. 37. ©2018 PayPal Inc. Confidential and proprietary. 37 A peek into Streaming SQL Launches … Spark Streaming App -- Streaming Window Seconds set gimel.kafka.throttle.streaming.window.seconds=10; -- Throttling set gimel.kafka.throttle.streaming.maxRatePerPartition=1500; -- ZK checkpoint root path set gimel.kafka.consumer.checkpoint.root=/checkpoints/appname; -- Checkpoint enabling flag - implicitly checkpoints after each mini-batch in streaming set gimel.kafka.reader.checkpoint.save.enabled=true; -- Jupyter Magic for streaming SQL on Notebooks | Interactive Usecases -- Livy REPL - Same magic for streaming SQL works | Streaming Usecases %%gimel-stream -- Assume Pre-Split HBASE Table as an example insert into pcatalog.HBASE_dataset select cust_id, kafka_ds.* from pcatalog.KAFKA_dataset kafka_ds; Batch SQL Launches … Spark Batch App -- Establish 10 concurrent connections per Topic-Partition set gimel.kafka.throttle.batch.parallelsPerPartition=10; -- Fetch at max - 10 M messages from each partition set gimel.kafka.throttle.batch.maxRecordsPerPartition=10,000,000; -- Jupyter Magic on Notebooks | Interactive Usecases -- Livy REPL - Same magic works | Batch Usecases %%gimel insert into pcatalog.HIVE_dataset partition(yyyy,mm,dd,hh,mi) select kafka_ds.*,gimel_load_id ,substr(commit_timestamp,1,4) as yyyy ,substr(commit_timestamp,6,2) as mm ,substr(commit_timestamp,9,2) as dd ,substr(commit_timestamp,12,2) as hh ,case when cast(substr(commit_timestamp,15,2) as INT) <= 30 then "00" else "30" end as mi from pcatalog.KAFKA_dataset kafka_ds; Following are Jupyter/Livy Magic terms • %%gimel : calls gimel.executeBatch(sql) • %%gimel-stream : calls gimel.executeStream(sql)
  38. 38. gimel.dataset.factory { KafkaDataSet ElasticSearchDataSet DruidDataSet HiveDataSet AerospikeDataSet HbaseDataSet CassandraDataSet JDBCDataSet } Metadata Services dataSet.read(“dataSetName”,options) dataSet.write(dataToWrite,”dataSetName”, options) dataStream.read(“dataSetName”, options) val storageDataSet = getFromFactory(type=“Hive”) { Core Connector Implementation, example – Kafka Combination of Open Source Connector and In-house implementations Open source connector such as DataStax / SHC / ES- Spark } & Anatomy of API gimel.datastream.factory { KafkaDataStream } CatalogProvider.getDataSetProperties(“dataSetName”) val storageDataStream = getFromStreamFactory(type=“kafka”) kafkaDataSet.read(“dataSetName”,options) hiveDataSet.write(dataToWrite,”dataSetName”, options) storageDataStream.read(“dataSetName”, options) dataSet.write(”pcatalog.HIVE_dataset”,readDf , options) val dataSet : gimel.DataSet = DataSet(sparkSession) val df1 = dataSet.read(“pcatalog.KAFKA_dataset”, options); df1.createGlobalTempView(“tmp_abc123”) Val resolvedSelectSQL = selectSQL.replace(“pcatalog.KAFKA_dataset”,”tmp_abc123”) Val readDf : DataFrame = sparkSession.sql(resolvedSelectSQL); select kafka_ds.*,gimel_load_id ,substr(commit_timestamp,1,4) as yyyy ,substr(commit_timestamp,6,2) as mm ,substr(commit_timestamp,9,2) as dd ,substr(commit_timestamp,12,2) as hh from pcatalog.KAFKA_dataset kafka_ds join default.geo_lkp lkp on kafka_ds.zip = geo_lkp.zip where geo_lkp.region = ‘MIDWEST’ %%gimel insert into pcatalog.HIVE_dataset partition(yyyy,mm,dd,hh,mi) -- Establish 10 concurrent connections per Topic-Partition set gimel.kafka.throttle.batch.parallelsPerPartition=10; -- Fetch at max - 10 M messages from each partition set gimel.kafka.throttle.batch.maxRecordsPerPartition=10,000,000; ©2018 PayPal Inc. Confidential and proprietary.
  39. 39. Set gimel.catalog.provider=PCATALOG CatalogProvider.getDataSetProperties(“dataSetName”) Metadata Services Set gimel.catalog.provider=USER CatalogProvider.getDataSetProperties(“dataSetName”) Set gimel.catalog.provider=HIVE CatalogProvider.getDataSetProperties(“dataSetName”) sql> set dataSetProperties={ "key.deserializer":"org.apache.kafka.common.serialization.StringDeserializer", "auto.offset.reset":"earliest", "gimel.kafka.checkpoint.zookeeper.host":"zookeeper:2181", "gimel.storage.type":"kafka", "gimel.kafka.whitelist.topics":"kafka_topic", "datasetName":"test_table1", "value.deserializer":"org.apache.kafka.common.serialization.ByteArrayDeserialize r", "value.serializer":"org.apache.kafka.common.serialization.ByteArraySerializer", "gimel.kafka.checkpoint.zookeeper.path":"/pcatalog/kafka_consumer/checkpoint", "gimel.kafka.avro.schema.source":"CSR", "gimel.kafka.zookeeper.connection.timeout.ms":"10000", "gimel.kafka.avro.schema.source.url":"http://schema_registry:8081", "key.serializer":"org.apache.kafka.common.serialization.StringSerializer", "gimel.kafka.avro.schema.source.wrapper.key":"schema_registry_key", "gimel.kafka.bootstrap.servers":"localhost:9092" } sql> Select * from pcatalog.test_table1. spark.sql("set gimel.catalog.provider=USER"); val dataSetOptions = DataSetProperties( "KAFKA", Array(Field("payload","string",true)) , Array(), Map( "datasetName" -> "test_table1", "auto.offset.reset"-> "earliest", "gimel.kafka.bootstrap.servers"-> "localhost:9092", "gimel.kafka.avro.schema.source"-> "CSR", "gimel.kafka.avro.schema.source.url"-> "http://schema_registry:8081", "gimel.kafka.avro.schema.source.wrapper.key"-> "schema_registry_key", "gimel.kafka.checkpoint.zookeeper.host"-> "zookeeper:2181", "gimel.kafka.checkpoint.zookeeper.path"-> "/pcatalog/kafka_consumer/checkpoint", "gimel.kafka.whitelist.topics"-> "kafka_topic", "gimel.kafka.zookeeper.connection.timeout.ms"-> "10000", "gimel.storage.type"-> "kafka", "key.serializer"-> "org.apache.kafka.common.serialization.StringSerializer", "value.serializer"-> "org.apache.kafka.common.serialization.ByteArraySerializer" ) ) dataSet.read(”test_table1",Map("dataSetProperties"->dataSetOptions)) CREATE EXTERNAL TABLE `pcatalog.test_table1` (payload string) LOCATION 'hdfs://tmp/' TBLPROPERTIES ( "datasetName" -> "dummy", "auto.offset.reset"-> "earliest", "gimel.kafka.bootstrap.servers"-> "localhost:9092", "gimel.kafka.avro.schema.source"-> "CSR", "gimel.kafka.avro.schema.source.url"-> "http://schema_registry:8081", "gimel.kafka.avro.schema.source.wrapper.key"-> "schema_registry_key", "gimel.kafka.checkpoint.zookeeper.host"-> "zookeeper:2181", "gimel.kafka.checkpoint.zookeeper.path"-> "/pcatalog/kafka_consumer/checkpoint", "gimel.kafka.whitelist.topics"-> "kafka_topic", "gimel.kafka.zookeeper.connection.timeout.ms"-> "10000", "gimel.storage.type"-> "kafka", "key.serializer"-> "org.apache.kafka.common.serialization.StringSerializer", "value.serializer"-> "org.apache.kafka.common.serialization.ByteArraySerializer" ); Spark-sql> Select * from pcatalog.test_table1 Scala> dataSet.read(”test_table1",Map("dataSetProperties"- >dataSetOptions)) Catalog Provider – USER | HIVE | PCATALOG | Your Own Catalog Metadata Set gimel.catalog.provider=YOUR_CATALOG CatalogProvider.getDataSetProperties(“dataSetName”) { // Implement this ! } ©2018 PayPal Inc. Confidential and proprietary.
  40. 40. Spark Thrift Server org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.sc ala //result = sqlContext.sql(statement)  Original SQL Execution //Integration of Gimel in Spark result = GimelQueryProcessor.executeBatch(statement, sqlContext.sparkSession) Integration with ecosystems class SparkSqlInterpreter(conf: SparkConf) extends SparkInterpreter(conf) { private val SCALA_MAGIC = "%%[sS][cC][aA][lL][aA] (.*)".r private val PCATALOG_BATCH_MAGIC = "%%[gG][iI][mM][eE][lL](.*)".r private val PCATALOG_STREAM_MAGIC = "%%[gG][iI][mM][eE][lL](.*)".sS][tT][rR][eE][aA][mM] (.*)".r // ........ // ..... case PCATALOG_BATCH_MAGIC(gimelCode) => GimelQueryProcessor.executeBatch(gimelCode, sparkSession) case PCATALOG_STREAM_MAGIC(gimelCode) => GimelQueryProcessor.executeStream(gimelCode, sparkSession) case _ => // ........ // ..... com/cloudera/livy/repl/SparkSqlInterpreter.scala Livy REPL sparkmagic/sparkmagic/kernels/sparkkernel/kernel.js define(['base/js/namespace'], function(IPython){ var onload = function() { IPython.CodeCell.config_defaults.highlight_modes['magic_text/x-sql'] = {'reg':[/^%%gimel/]};} return { onload: onload }}) Jupyter Notebooks ©2018 PayPal Inc. Confidential and proprietary.
  41. 41. Systems Data Stores Supported ©2018 PayPal Inc. Confidential and proprietary. 41
  42. 42. Gimel – Open Source & Future 42©2018 PayPal Inc. Confidential and proprietary.
  43. 43. What’s Next • Query optimization • Open source PCatalog: • Metadata services • Discovery services • Catalog UI • Livy features committed back to open source • Python support Jupyter features committed back to open source ©2018 PayPal Inc. Confidential and proprietary. • Open source Gimel (http://try.gimel.io)
  44. 44. Gimel - Open Sourced Gimel: http://gimel.io Codebase available: https://github.com/paypal/gimel Slack: https://gimel-dev.slack.com Google Groups: https://groups.google.com/d/forum/gimel-dev ©2017 PayPal Inc. Confidential and proprietary. 44
  45. 45. Acknowledgements 45
  46. 46. Acknowledgements Gimel and PayPal Notebooks team: Andrew Alves Anisha Nainani Ayushi Agarwal Baskaran Gopalan Dheeraj Rampally Deepak Chandramouli Laxmikant Patil Meisam Fathi Salmi Prabhu Kasinathan Praveen Kanamarlapudi Romit Mehta Thilak Balasubramanian Weijun Qian 46
  47. 47. Q&A ( 1 0:55 A M ) G i m e l C o d e la bs: h t t p:/ /tr y.gime l.i o S l a ck : h t t ps :// gime l - de v.s la ck .com G o o gle G roups: h t t p s:/ /groups .google .com/ d/for um/ gim el - dev 47
  48. 48. Appendix 48©2018 PayPal Inc. Confidential and proprietary.
  49. 49. References Used Images Referred : https://www.google.com/search?q=big+data+stack+images&source=lnms&tbm=isch&sa=X&ved=0ahUKEwip1Jz3voPaAhU oxFQKHV33AsgQ_AUICigB&biw=1440&bih=799 49©2018 PayPal Inc. Confidential and proprietary.
  50. 50. Spark Thrift Server - Integration spark/sql/hive- thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala //result = sqlContext.sql(statement)  Original SQL Execution //Integration of Gimel in Spark result = GimelQueryProcessor.executeBatch(statement, sqlContext.sparkSession) ©2018 PayPal Inc. Confidential and proprietary.
  51. 51. Livy - Integration class SparkSqlInterpreter(conf: SparkConf) extends SparkInterpreter(conf) { private val SCALA_MAGIC = "%%[sS][cC][aA][lL][aA] (.*)".r private val PCATALOG_BATCH_MAGIC = "%%[gG][iI][mM][eE][lL](.*)".r private val PCATALOG_STREAM_MAGIC = "%%[gG][iI][mM][eE][lL](.*)".sS][tT][rR][eE][aA][mM] (.*)".r // ........ // ..... override def execute(code: String, outputPath: String): Interpreter.ExecuteResponse = { require(sparkContext != null && sqlContext != null && sparkSession != null) code match { case SCALA_MAGIC(scalaCode) => super.execute(scalaCode, null) case PCATALOG_BATCH_MAGIC(gimelCode) => Try { GimelQueryProcessor.executeBatch(gimelCode, sparkSession) } match { case Success(x) => Interpreter.ExecuteSuccess(TEXT_PLAIN -> x) case _ => Interpreter.ExecuteError("Failed", " ") } case PCATALOG_STREAM_MAGIC(gimelCode) => Try { GimelQueryProcessor.executeStream(gimelCode, sparkSession) } match { case Success(x) => Interpreter.ExecuteSuccess(TEXT_PLAIN -> x) case _ => Interpreter.ExecuteError("Failed", " ") } case _ => // ........ // ..... /repl/src/main/scala/com/cloudera/livy/repl/SparkSqlInterpreter.s cala ©2018 PayPal Inc. Confidential and proprietary.
  52. 52. PayPal Notebooks (Jupyter) - Integration def _scala_pcatalog_command(self, sql_context_variable_name): if sql_context_variable_name == u'spark': command = u'val output= {{import java.io.{{ByteArrayOutputStream, StringReader}};val outCapture = new ByteArrayOutputStream;Console.withOut(outCapture){{gimel.GimelQueryProcessor.executeBatch("""{}""",sparkSession)}}}}'.format(self.query) else: command = u'val output= {{import java.io.{{ByteArrayOutputStream, StringReader}};val outCapture = new ByteArrayOutputStream;Console.withOut(outCapture){{gimel..GimelQueryProcessor.executeBatch("""{}""",{})}}}}'.format(self.query, sql_context_variable_name) if self.samplemethod == u'sample': command = u'{}.sample(false, {})'.format(command, self.samplefraction) if self.maxrows >= 0: command = u'{}.take({})'.format(command, self.maxrows) else: command = u'{}.collect'.format(command) return Command(u'{}.foreach(println)'.format(command+';noutput')) sparkmagic/sparkmagic/livyclientlib/sqlquery.py sparkmagic/sparkmagic/kernels/sparkkernel/kernel.js define(['base/js/namespace'], function(IPython){ var onload = function() { IPython.CodeCell.config_defaults.highlight_modes['magic_text/x-sql'] = {'reg':[/^%%sql/]}; IPython.CodeCell.config_defaults.highlight_modes['magic_text/x-python'] = {'reg':[/^%%local/]}; IPython.CodeCell.config_defaults.highlight_modes['magic_text/x-sql'] = {'reg':[/^%%gimel/]};} return { onload: onload } }) ©2018 PayPal Inc. Confidential and proprietary.
  53. 53. Connectors | High level ©2018 PayPal Inc. Confidential and proprietary. 53 Storage Version API Implementation Kafka 0.10.2 Batch & Stream Connectors – Implementation from scratch Elastic Search 5.4.6 Connector | https://www.elastic.co/guide/en/elasticsearch/hadoop/5.4/spark.html Additional implementations added in Gimel to support daily / monthly partitioned indexes in ES Aerospike 3.1x Read | Aerospike Spark Connector(Aerospark) is used to read data directly into a DataFrame (https://github.com/sasha-polev/aerospark) Write | Aerospike Native Java Client Put API is used. For each partition of the Dataframe a client connection is established, to write data from that partition to Aerospike. HBASE 1.2 Connector | Horton Works HBASE Connector for Spark (SHC) https://github.com/hortonworks-spark/shc Cassandra 2.x Connector | DataStax Connector https://github.com/datastax/spark-cassandra-connector HIVE 1.2 Leverages spark APIs under the hood. Druid 0.82 Connector | Leverages Tranquility under the hood https://github.com/druid-io/tranquility Teradata / Relational Leverages JDBC Storage Handler Support for Batch Reads/Loads , FAST Load & FAST Exports Alluxio Leverage Cross cluster access via reads using Spark Conf : spark.yarn.access.namenodes
  54. 54. Dataset Registration Process Flow ©2018 PayPal Inc. Confidential and proprietary. 54 Data Platform Onboard Fill Meta & Submit Approval Request Requestor Approver Approved Create Dataset API PCatalog Storage User/Developer Submit Job Create Dataset Meta on PCatalog Create Catalog on Storage Compute (Data API) Data Get Dataset Meta Access 1 2 3 4 5 6 1 2 3 4 Auto-Approve2 RESTAPI
  55. 55. Gimel Data Catalog Features ©2018 PayPal Inc. Confidential and proprietary. 55 Dashboard and Alerts Query and BI integration Explorer Discovery • Auto-discover datasets across all data stores • View available datasets • View schema • View system and object attributes • Integration with Jupyter notebooks • Integration with BI tools • Operational metrics: stats, refresh time, trends • Approvals and audits • Admin alerts: Capacity issues, data access violations, data classification violations • User alerts: refresh delays, profile anomalies

×