Gimel Data Platform is an analytics platform developed by PayPal that aims to simplify data access and analysis. The presentation provides an overview of Gimel, including PayPal's analytics ecosystem, the challenges Gimel addresses in data access and application lifecycle management, a demo of a sample flights cancelled use case using Gimel, and PayPal's plans to open source Gimel.
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...Deepak Chandramouli
PayPal Data Lake Journey | 2017-Oct | San Diego | Teradata Edge of Next
Gimel [http://www.gimel.io] is a Big Data Processing Library, open sourced by PayPal.
https://www.youtube.com/watch?v=52PdNno_9cU&t=3s
Gimel empowers analysts, scientists, data engineers alike to access a variety of Big Data / Traditional Data Stores - with just SQL or a single line of code (Unified Data API).
This is possible via the Catalog of Technical properties abstracted from users, along with a rich collection of Data Store Connectors available in Gimel Library.
A Catalog provider can be Hive or User Supplied (runtime) or UDC.
In addition, PayPal recently open sourced UDC [Unified Data Catalog], which can host and serve the Technical Metatada of the Data Stores & Objects. Visit http://www.unifieddatacatalog.io to experience first hand.
Site | https://www.infoq.com/qconai2018/
Youtube | https://www.youtube.com/watch?v=2h0biIli2F4&t=19s
At PayPal, data engineers, analysts and data scientists work with a variety of datasources (Messaging, NoSQL, RDBMS, Documents, TSDB), compute engines (Spark, Flink, Beam, Hive), languages (Scala, Python, SQL) and execution models (stream, batch, interactive).
Due to this complex matrix of technologies and thousands of datasets, engineers spend considerable time learning about different data sources, formats, programming models, APIs, optimizations, etc. which impacts time-to-market (TTM). To solve this problem and to make product development more effective, PayPal Data Platform developed "Gimel", a unified analytics data platform which provides access to any storage through a single unified data API and SQL, that are powered by a centralized data catalog.
In this session, we will introduce you to the various components of Gimel - Compute Platform, Data API, PCatalog, GSQL and Notebooks. We will provide a demo depicting how Gimel reduces TTM by helping our engineers write a single line of code to access any storage without knowing the complexity behind the scenes.
Data Orchestration Summit 2020 organized by Alluxio
https://www.alluxio.io/data-orchestration-summit-2020/
Unified Data Access with Gimel
Deepak Chandramouli, Engineering Lead
Anisha Nainani, Sr. Software Engineer
Dr. Vladimir Bacvanski, Principal Architect (Paypal)
About Alluxio: alluxio.io
Engage with the open source community on slack: alluxio.io/slack
Unified Data Catalog - Recommendations powered by Apache Spark & Neo4jDeepak Chandramouli
Youtube | https://youtu.be/zGX0fRLdd6s?list=PLPaGQXwz_-RaoHicnGhL5SyOAp3_lUTQ2&t=1
This is a talk from PayPal at Nodes Online Summit, organized by Neo4j.
For more session details and video - please visit this link.
https://neo4j.com/online-summit/session/recommendations-unified-data-catalog-spark-neo4j
Gimel is a data abstraction framework built on Apache Spark - providing unified Data Access via API & SQL to different technologies such as kafka, elastic, HBASE, Rest API, File, Object stores, Relational , etc.
We spoke about this recently in the "cloud track" in the "Scale By The Bay" Conference.
https://www.scale.bythebay.io/schedule
https://sched.co/e55D
Youtube - https://www.youtube.com/watch?v=cy8g2WZbEBI&ab_channel=FunctionalTV
https://youtu.be/m6_0iI4XDpU
Democratizing data science Using spark, hive and druidDataWorks Summit
MZ is re-inventing how the entire world experiences data via our mobile games division MZ Games Studios, our digital marketing division Cognant, and our live data platform division Satori.
Growing need of data science capabilities across the organization requires an architecture that can democratize building these applications and disseminating insight from the outcome of data science applications to the wider organization.
Attend this session to learn about how we built a platform for data science using spark, hive, and druid specifically for our performance marketing division cognant.This platform powers several data science application like fraud detection and bid optimization at large scale.
We will be sharing lessons learned over past 3 years in building this platform by also walking through some of the actual data science applications built on top of this platform.
Attendees from ML engineering and data science background can gain deep insight from our experience of building this platform.
Speakers
Pushkar Priyadarshi, Director of Engineer, Michaine Zone Inc
Igor Yurinok, Staff Software Engineer, MZ
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...Deepak Chandramouli
PayPal Data Lake Journey | 2017-Oct | San Diego | Teradata Edge of Next
Gimel [http://www.gimel.io] is a Big Data Processing Library, open sourced by PayPal.
https://www.youtube.com/watch?v=52PdNno_9cU&t=3s
Gimel empowers analysts, scientists, data engineers alike to access a variety of Big Data / Traditional Data Stores - with just SQL or a single line of code (Unified Data API).
This is possible via the Catalog of Technical properties abstracted from users, along with a rich collection of Data Store Connectors available in Gimel Library.
A Catalog provider can be Hive or User Supplied (runtime) or UDC.
In addition, PayPal recently open sourced UDC [Unified Data Catalog], which can host and serve the Technical Metatada of the Data Stores & Objects. Visit http://www.unifieddatacatalog.io to experience first hand.
Site | https://www.infoq.com/qconai2018/
Youtube | https://www.youtube.com/watch?v=2h0biIli2F4&t=19s
At PayPal, data engineers, analysts and data scientists work with a variety of datasources (Messaging, NoSQL, RDBMS, Documents, TSDB), compute engines (Spark, Flink, Beam, Hive), languages (Scala, Python, SQL) and execution models (stream, batch, interactive).
Due to this complex matrix of technologies and thousands of datasets, engineers spend considerable time learning about different data sources, formats, programming models, APIs, optimizations, etc. which impacts time-to-market (TTM). To solve this problem and to make product development more effective, PayPal Data Platform developed "Gimel", a unified analytics data platform which provides access to any storage through a single unified data API and SQL, that are powered by a centralized data catalog.
In this session, we will introduce you to the various components of Gimel - Compute Platform, Data API, PCatalog, GSQL and Notebooks. We will provide a demo depicting how Gimel reduces TTM by helping our engineers write a single line of code to access any storage without knowing the complexity behind the scenes.
Data Orchestration Summit 2020 organized by Alluxio
https://www.alluxio.io/data-orchestration-summit-2020/
Unified Data Access with Gimel
Deepak Chandramouli, Engineering Lead
Anisha Nainani, Sr. Software Engineer
Dr. Vladimir Bacvanski, Principal Architect (Paypal)
About Alluxio: alluxio.io
Engage with the open source community on slack: alluxio.io/slack
Unified Data Catalog - Recommendations powered by Apache Spark & Neo4jDeepak Chandramouli
Youtube | https://youtu.be/zGX0fRLdd6s?list=PLPaGQXwz_-RaoHicnGhL5SyOAp3_lUTQ2&t=1
This is a talk from PayPal at Nodes Online Summit, organized by Neo4j.
For more session details and video - please visit this link.
https://neo4j.com/online-summit/session/recommendations-unified-data-catalog-spark-neo4j
Gimel is a data abstraction framework built on Apache Spark - providing unified Data Access via API & SQL to different technologies such as kafka, elastic, HBASE, Rest API, File, Object stores, Relational , etc.
We spoke about this recently in the "cloud track" in the "Scale By The Bay" Conference.
https://www.scale.bythebay.io/schedule
https://sched.co/e55D
Youtube - https://www.youtube.com/watch?v=cy8g2WZbEBI&ab_channel=FunctionalTV
https://youtu.be/m6_0iI4XDpU
Democratizing data science Using spark, hive and druidDataWorks Summit
MZ is re-inventing how the entire world experiences data via our mobile games division MZ Games Studios, our digital marketing division Cognant, and our live data platform division Satori.
Growing need of data science capabilities across the organization requires an architecture that can democratize building these applications and disseminating insight from the outcome of data science applications to the wider organization.
Attend this session to learn about how we built a platform for data science using spark, hive, and druid specifically for our performance marketing division cognant.This platform powers several data science application like fraud detection and bid optimization at large scale.
We will be sharing lessons learned over past 3 years in building this platform by also walking through some of the actual data science applications built on top of this platform.
Attendees from ML engineering and data science background can gain deep insight from our experience of building this platform.
Speakers
Pushkar Priyadarshi, Director of Engineer, Michaine Zone Inc
Igor Yurinok, Staff Software Engineer, MZ
Highly configurable and extensible data processing framework at PubMaticDataWorks Summit
PubMatic is a leading advertisement technology company that processes 500 billion transactions (50 terabytes of data) per day in real-time and batch processing pipeline on a 900-node cluster to power highly efficient machine learning algorithms, provide real time feedback to ad-server for optimization and provide in depth insights on customer inventory and audience.
At PubMatic, scaling with ever growing volume has always been the biggest challenge; we have been optimizing our technology stack for performance and costs. Another challenge is to support the demand for variety reports and analytics by customers and internal stakeholders. Writing custom jobs to provide analytics leads to repetitive efforts and redundancy of business logic in many different jobs.
To solve the above problems, we built a platform that allows creating configuration driven data processing pipeline with high re-usability of business functions. It is also extensible to utilize cutting-edge technologies in the ever-changing big data ecosystem. This platform enables our development teams to build a robust batch data processing pipeline to power analytics dashboards. It also empowers novice users to provide a configuration with fact and dimensions to generate ad-hoc reports in a single data processing job. Framework intelligently identifies and re-uses existing business functions based on user inputs. It also provides an abstraction layer that keeps core business logic un-affected by the any technology changes. This framework is currently powered by Spark, but it can be easily configured with other technologies.
Framework significantly improved time to develop data processing jobs from weeks to few days, it simplified unit testing and QA automation, as well as provided simpler interfaces to the customers and internal stakeholders to generate custom reports.
Speaker
Kunal Umrigar, Sr. Director Engineering Big Data & Analytics, PubMatic
This is a brief technology introduction to Oracle Stream Analytics, and how to use the platform to develop streaming data pipelines that support a wide variety of industry use cases
Journey to Creating a 360 View of the Customer: Implementing Big Data Strateg...Databricks
"The modernization of the tobacco industry is resulting in a shift towards a more data-driven approach to trade, operations and the consumer. The need to scale while maintaining margins is paramount, and today’s consumer requires more personalized engagement and value at every interaction to drive sales and revenue.
At Altria, we’re at the forefront of this evolution, leveraging hundreds of terabytes of big data (such as point-of-sale, clickstream, mobile data, and more) and machine learning to improve our ability to make smarter decisions and outpace the competition. This talk recaps our big data journey from a legacy data infrastructure (Teradata), isolated data systems, and the lack of resources which prevented our ability to move quickly and scale, to our current state where we’ve successfully implemented, architected and on-boarded tools and processes in stages of data acquisition, store, prepare, and business intelligence with Azure Data Lake, Azure Databricks, Azure Data factory, APIs Managements, Streaming and Hosting technologies and provided Data Analytics platform.
We’ll discuss the roadblocks we came across, how we overcame them, and how we employed a unified approach to big data and analytics through the fully managed Azure Databricks platform and the Azure suite of tools which allowed us to streamline workflows, improve operational performance, and ultimately introduce new customer experiences that drive engagement and revenue."
Gimel and PayPal Notebooks @ TDWI Leadership Summit OrlandoRomit Mehta
This is my presentation at TDWI Leadership Summit. It talks about how products like Gimel, Unified Data Catalog and PayPal Notebooks help improve data scientist productivity and enable machine learning at scale at PayPal.
This is our presentation of PayPal Notebooks and PPExtensions at Jupytercon 2018 in New York. We talked about PayPal's big data ecosystem, complexity in analytics with polyglot data stores, and how our open sourced PPExtensions help abstract the complexities from our data scientists and analysts. With PayPal Notebooks, we are dramatically reducing the time to market for our data scientists to go from research to deployed models.
ppextensions.io
gimel.io
unifieddatacatalog.io
Ping me on LinkedIn if you need more info!
The Convergence of Reporting and Interactive BI on HadoopDataWorks Summit
Since the early days of Hive, SQL on Hadoop has evolved from being a SQL wrapper on top of MapReduce to a viable replacement for the traditional EDW. In the meantime, while SQL on Hadoop vendors were busy adding enterprise capabilities and comparing their TPC-DS prowess against Hive, a niche industry emerged on the side for OLAP (a.k.a. "Interactive BI") on Hadoop data. Unlike general-purpose SQL on Hadoop engines, which deal with the multiple aspects of warehousing, including reporting, OLAP on Hadoop engines focus almost exclusively on answering OLAP queries fast by using implementation techniques that had not been part of the SQL on Hadoop toolbox so far.
But SQL on Hadoop engines are not standing still. After having made huge progress in catching up to traditional EDWs for reporting workloads, SQL on Hadoop engines are now setting their sights on Interactive BI. This is great news for enterprises: as the line between reporting and OLAP gets blurred, enterprises can now start considering using a single engine for both reporting and interactive BI on their Hadoop data, as opposed to having to host, manage and license two separate products.
Can a single engine satisfy both your reporting and Interactive BI needs? This may be a hard question to answer. Vendors use inconsistent terminology to describe their products and make ambitious and sometimes conflicting claims. This makes it very hard for enterprises to compare products, let alone decide which is the product that best matches their needs.
In this presentation, we'll provide an overview of the different approaches to OLAP on Hadoop, and explain the key technologies behind each of them. We'll use consistent terminology to describe what you get from multiple proprietary and open source products, and outline advantages and disadvantages. You'll come out equipped with the knowledge you need to read past marketing and sales pitches; you'll be able to compare products and make an informed decision on whether a single engine for both reporting and Interactive BI on Hadoop is right for you.
Speaker
Gustavo Arocena, Big Data Architect, IBM
Modern data management using Kappa and streaming architectures, including discussion by EBay's Connie Yang about the Rheos platform and the use of Oracle GoldenGate, Kafka, Flink, etc.
Presentation to discuss major shift in enterprise data management. Describes movement away from older hub and spoke data architecture and towards newer, more modern Kappa data architecture
Accelerating query processing with materialized views in Apache HiveDataWorks Summit
Over the last few years, the Apache Hive community has been working on advancements to enable a full new range of use cases for the project, moving from its batch processing roots towards a SQL interactive query answering platform. Traditionally, one of the most powerful techniques used to accelerate query processing in data warehouses is the precomputation of relevant summaries or materialized views.
This talk presents our work on introducing materialized views and automatic query rewriting based on those materializations in Apache Hive. In particular, materialized views can be stored natively in Hive or in other systems such as Druid using custom storage handlers, and they can seamlessly exploit new exciting Hive features such as LLAP acceleration. Then the optimizer relies in Apache Calcite to automatically produce full and partial rewritings for a large set of query expressions comprising projections, filters, join, and aggregation operations. We shall describe the current coverage of the rewriting algorithm, how Hive controls important aspects of the life cycle of the materialized views such as the freshness of their data, and outline interesting directions for future improvements. We include an experimental evaluation highlighting the benefits that the usage of materialized views can bring to the execution of Hive workloads.
Speaker
Jesus Camacho Rodriguez, Member of Technical Staff, Hortonworks
Security, ETL, BI & Analytics, and Software IntegrationDataWorks Summit
Liberty Mutual Enterprise Data Lake Use Case Study
By building a data lake, Liberty Mutual Insurance Group Enterprise Analytics department has created a platform to implement various big data analytic projects. We will share our journey and how we leveraged Hortonworks Hadoop distribution and other open source technologies to meet our project needs. This session will cover data lake architecture, security, and use cases.
Securing and governing a multi-tenant data lake within the financial industryDataWorks Summit
Standard Bank South Africa is a Hortonworks client, with several multi-node clusters hosting Hortonworks Data Platform (HDP) and Hortonworks Data Flow (HDF). This presentation will discuss the technical detail of implementing security, governance and multi-tenancy on a "Data Lake" within the finance industry. The talk will address the team's experiences, challenges, failures and learnings that we took away from this behemoth of an adventure.
After introducing Standard Bank and the Hadoop admin team, the presentation will describe the security and governance journey Standard Bank has undergone since the project's inception in 2015, as well as the roadmap for the future ahead.
Presentation structure:
1. Team introduction with background information
2. Environment overview (Where we are - Current)
-----Security
---------Authentication through Kerberos and LDAP/ AD
---------Authorization through Ranger and Centrify
---------Transparent Data Encryption (TDE) at rest
-----Governance
---------Centralized auditing
---------Ranger policies and data steward ownership
-----Multi-Tenancy
---------Data lake Vs. data analytics platform
---------Edge nodes Vs. API framework through Knox
3. How did we get to this stage? (Past)
-----Challenges faced (Kerberos, AD integration, SSL)
-----How we overcame these challenges
4. Future challenges we foresee (Future)
-----How we are planning to prepare for them
Speakers
Ian Pillay, Hadoop Administrator, Standard Bank
Brad Smith, Hadoop Administrator, Standard Bank
Deep-dive into Microservices Patterns with Replication and Stream Analytics
Target Audience: Microservices and Data Architects
This is an informational presentation about microservices event patterns, GoldenGate event replication, and event stream processing with Oracle Stream Analytics. This session will discuss some of the challenges of working with data in a microservices architecture (MA), and how the emerging concept of a “Data Mesh” can go hand-in-hand to improve microservices-based data management patterns. You may have already heard about common microservices patterns like CQRS, Saga, Event Sourcing and Transaction Outbox; we’ll share how GoldenGate can simplify these patterns while also bringing stronger data consistency to your microservice integrations. We will also discuss how complex event processing (CEP) and stream processing can be used with event-driven MA for operational and analytical use cases.
Business pressures for modernization and digital transformation drive demand for rapid, flexible DevOps, which microservices address, but also for data-driven Analytics, Machine Learning and Data Lakes which is where data management tech really shines. Join us for this presentation where we take a deep look at the intersection of microservice design patterns and modern data integration tech.
Learn about data lifecycle best practices in the AWS Cloud, so you can optimize performance and lower the costs of data ingestion, staging, storage, cleansing, analytics and visualization, and archiving.
From BI Developer to Data Engineer with Oracle Analytics Cloud, Data LakeRittman Analytics
In this session, we'll look at the role of the data engineer in designing, provisioning, and enabling an Oracle Cloud data lake using Oracle Analytics Cloud Data Lake Edition. We’ll also examine the use of data flow and data pipeline authoring tools and how machine learning and AI can be applied to this task. Furthermore, we’ll explore connecting to database and SaaS sources along with sources of external data via Oracle Data-as-a-Service. Finally we’ll delve into how traditional Oracle Analytics developers can transition their skills into this role and start working as data engineers on Oracle Public Cloud data lake projects.
The convergence of reporting and interactive BI on HadoopDataWorks Summit
Since the early days of Hive, SQL on Hadoop has evolved from being a SQL wrapper on top of MapReduce to a viable replacement for the traditional EDW. In the meantime, while SQL-on-Hadoop vendors were busy adding enterprise capabilities and comparing their TPC-DS prowess against Hive, a niche industry emerged on the side for OLAP (a.k.a. “Interactive BI”) on Hadoop data. Unlike general-purpose SQL-on-Hadoop engines, which deal with the multiple aspects of warehousing, including reporting, OLAP-on-Hadoop engines focus almost exclusively on answering OLAP queries fast by using implementation techniques that had not been part of the SQL-on-Hadoop toolbox so far.
But SQL-on-Hadoop engines are not standing still. After having made huge progress in catching up to traditional EDWs for reporting workloads, SQL-on-Hadoop engines are now setting their sights on interactive BI. This is great news for enterprises. As the line between reporting and OLAP gets blurred, enterprises can now start considering using a single engine for both reporting and Interactive BI on their Hadoop data, as opposed to having to host, manage, and license two separate products.
Can a single engine satisfy both your reporting and Interactive BI needs? This may be a hard question to answer. Vendors use inconsistent terminology to describe their products and make ambitious and sometimes conflicting claims. This makes it very hard for enterprises to compare products, let alone decide which is the product that best matches their needs.
In this presentation, we’ll provide an overview of the different approaches to OLAP on Hadoop, and explain the key technologies behind each of them. We’ll use consistent terminology to describe what you get from multiple proprietary and open source products and outline advantages and disadvantages. You’ll come out equipped with the knowledge you need to read past marketing and sales pitches. You’ll be able to compare products and make an informed decision on whether a single engine for both reporting and Interactive BI on Hadoop is right for you.
Speaker
Gustavo Arocena, Big Data Architect, IBM
Adding structure to your streaming pipelines: moving from Spark streaming to ...DataWorks Summit
How do you go from a strictly typed object-based streaming pipeline with simple operations to a structured streaming pipeline with higher order complex relational operations? This is what the Data Engineering team did at GoPro to scale up the development of streaming pipelines for the rapidly growing number of devices and applications.
When big data frameworks such as Hadoop first came to exist, developers were happy because we could finally process large amounts of data without writing complex multi-threaded code or worse yet writing complicated distributed code. Unfortunately, only very simple operations were available such as map and reduce. Almost immediately, higher level operations were desired similar to relational operations. And so Hive and dozens (hundreds?) of SQL-based big data tools became available for more developer-efficient batch processing of massive amounts of data.
In recent years, big data has moved from batch processing to stream-based processing since no one wants to wait hours or days to gain insights. Dozens of stream processing frameworks exist today and the same trend that occurred in the batch-based big data processing realm has taken place in the streaming world, so that nearly every streaming framework now supports higher level relational operations.
In this talk, we will discuss in a very hands-on manner how the streaming data pipelines for GoPro devices and apps have moved from the original Spark streaming with its simple RDD-based operations in Spark 1.x to Spark's structured streaming with its higher level relational operations in Spark 2.x. We will talk about the differences, advantages, and necessary pain points that must be addressed in order to scale relational-based streaming pipelines for massive IoT streams. We will also talk about moving from “hand built” Hadoop/Spark clusters running in the cloud to using a Spark-based cloud service. DAVID WINTERS, Big Data Architect, GoPro and HAO ZOU, Senior Software Engineer, GoPro
Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?SnapLogic
Companies collect more data but struggle with how to glean the best insights. Use of Machine Learning also needs power data integration.
In this presentation, Janet Jaiswal, SnapLogic's VP of product marketing, reviews key strategies and technologies to deliver intelligent data via self-service ML models.
To learn more, visit https://www.snaplogic.com
apidays LIVE Paris - Data with a mission: a COVID-19 API case study by Matt M...apidays
apidays LIVE Paris - Responding to the New Normal with APIs for Business, People and Society
December 8, 9 & 10, 2020
Data with a mission: a COVID-19 API case study
Matt McLarty, Global Leader of API Strategy at MuleSoft
Sanjna Verma, Product Manager at Salesforce
Highly configurable and extensible data processing framework at PubMaticDataWorks Summit
PubMatic is a leading advertisement technology company that processes 500 billion transactions (50 terabytes of data) per day in real-time and batch processing pipeline on a 900-node cluster to power highly efficient machine learning algorithms, provide real time feedback to ad-server for optimization and provide in depth insights on customer inventory and audience.
At PubMatic, scaling with ever growing volume has always been the biggest challenge; we have been optimizing our technology stack for performance and costs. Another challenge is to support the demand for variety reports and analytics by customers and internal stakeholders. Writing custom jobs to provide analytics leads to repetitive efforts and redundancy of business logic in many different jobs.
To solve the above problems, we built a platform that allows creating configuration driven data processing pipeline with high re-usability of business functions. It is also extensible to utilize cutting-edge technologies in the ever-changing big data ecosystem. This platform enables our development teams to build a robust batch data processing pipeline to power analytics dashboards. It also empowers novice users to provide a configuration with fact and dimensions to generate ad-hoc reports in a single data processing job. Framework intelligently identifies and re-uses existing business functions based on user inputs. It also provides an abstraction layer that keeps core business logic un-affected by the any technology changes. This framework is currently powered by Spark, but it can be easily configured with other technologies.
Framework significantly improved time to develop data processing jobs from weeks to few days, it simplified unit testing and QA automation, as well as provided simpler interfaces to the customers and internal stakeholders to generate custom reports.
Speaker
Kunal Umrigar, Sr. Director Engineering Big Data & Analytics, PubMatic
This is a brief technology introduction to Oracle Stream Analytics, and how to use the platform to develop streaming data pipelines that support a wide variety of industry use cases
Journey to Creating a 360 View of the Customer: Implementing Big Data Strateg...Databricks
"The modernization of the tobacco industry is resulting in a shift towards a more data-driven approach to trade, operations and the consumer. The need to scale while maintaining margins is paramount, and today’s consumer requires more personalized engagement and value at every interaction to drive sales and revenue.
At Altria, we’re at the forefront of this evolution, leveraging hundreds of terabytes of big data (such as point-of-sale, clickstream, mobile data, and more) and machine learning to improve our ability to make smarter decisions and outpace the competition. This talk recaps our big data journey from a legacy data infrastructure (Teradata), isolated data systems, and the lack of resources which prevented our ability to move quickly and scale, to our current state where we’ve successfully implemented, architected and on-boarded tools and processes in stages of data acquisition, store, prepare, and business intelligence with Azure Data Lake, Azure Databricks, Azure Data factory, APIs Managements, Streaming and Hosting technologies and provided Data Analytics platform.
We’ll discuss the roadblocks we came across, how we overcame them, and how we employed a unified approach to big data and analytics through the fully managed Azure Databricks platform and the Azure suite of tools which allowed us to streamline workflows, improve operational performance, and ultimately introduce new customer experiences that drive engagement and revenue."
Gimel and PayPal Notebooks @ TDWI Leadership Summit OrlandoRomit Mehta
This is my presentation at TDWI Leadership Summit. It talks about how products like Gimel, Unified Data Catalog and PayPal Notebooks help improve data scientist productivity and enable machine learning at scale at PayPal.
This is our presentation of PayPal Notebooks and PPExtensions at Jupytercon 2018 in New York. We talked about PayPal's big data ecosystem, complexity in analytics with polyglot data stores, and how our open sourced PPExtensions help abstract the complexities from our data scientists and analysts. With PayPal Notebooks, we are dramatically reducing the time to market for our data scientists to go from research to deployed models.
ppextensions.io
gimel.io
unifieddatacatalog.io
Ping me on LinkedIn if you need more info!
The Convergence of Reporting and Interactive BI on HadoopDataWorks Summit
Since the early days of Hive, SQL on Hadoop has evolved from being a SQL wrapper on top of MapReduce to a viable replacement for the traditional EDW. In the meantime, while SQL on Hadoop vendors were busy adding enterprise capabilities and comparing their TPC-DS prowess against Hive, a niche industry emerged on the side for OLAP (a.k.a. "Interactive BI") on Hadoop data. Unlike general-purpose SQL on Hadoop engines, which deal with the multiple aspects of warehousing, including reporting, OLAP on Hadoop engines focus almost exclusively on answering OLAP queries fast by using implementation techniques that had not been part of the SQL on Hadoop toolbox so far.
But SQL on Hadoop engines are not standing still. After having made huge progress in catching up to traditional EDWs for reporting workloads, SQL on Hadoop engines are now setting their sights on Interactive BI. This is great news for enterprises: as the line between reporting and OLAP gets blurred, enterprises can now start considering using a single engine for both reporting and interactive BI on their Hadoop data, as opposed to having to host, manage and license two separate products.
Can a single engine satisfy both your reporting and Interactive BI needs? This may be a hard question to answer. Vendors use inconsistent terminology to describe their products and make ambitious and sometimes conflicting claims. This makes it very hard for enterprises to compare products, let alone decide which is the product that best matches their needs.
In this presentation, we'll provide an overview of the different approaches to OLAP on Hadoop, and explain the key technologies behind each of them. We'll use consistent terminology to describe what you get from multiple proprietary and open source products, and outline advantages and disadvantages. You'll come out equipped with the knowledge you need to read past marketing and sales pitches; you'll be able to compare products and make an informed decision on whether a single engine for both reporting and Interactive BI on Hadoop is right for you.
Speaker
Gustavo Arocena, Big Data Architect, IBM
Modern data management using Kappa and streaming architectures, including discussion by EBay's Connie Yang about the Rheos platform and the use of Oracle GoldenGate, Kafka, Flink, etc.
Presentation to discuss major shift in enterprise data management. Describes movement away from older hub and spoke data architecture and towards newer, more modern Kappa data architecture
Accelerating query processing with materialized views in Apache HiveDataWorks Summit
Over the last few years, the Apache Hive community has been working on advancements to enable a full new range of use cases for the project, moving from its batch processing roots towards a SQL interactive query answering platform. Traditionally, one of the most powerful techniques used to accelerate query processing in data warehouses is the precomputation of relevant summaries or materialized views.
This talk presents our work on introducing materialized views and automatic query rewriting based on those materializations in Apache Hive. In particular, materialized views can be stored natively in Hive or in other systems such as Druid using custom storage handlers, and they can seamlessly exploit new exciting Hive features such as LLAP acceleration. Then the optimizer relies in Apache Calcite to automatically produce full and partial rewritings for a large set of query expressions comprising projections, filters, join, and aggregation operations. We shall describe the current coverage of the rewriting algorithm, how Hive controls important aspects of the life cycle of the materialized views such as the freshness of their data, and outline interesting directions for future improvements. We include an experimental evaluation highlighting the benefits that the usage of materialized views can bring to the execution of Hive workloads.
Speaker
Jesus Camacho Rodriguez, Member of Technical Staff, Hortonworks
Security, ETL, BI & Analytics, and Software IntegrationDataWorks Summit
Liberty Mutual Enterprise Data Lake Use Case Study
By building a data lake, Liberty Mutual Insurance Group Enterprise Analytics department has created a platform to implement various big data analytic projects. We will share our journey and how we leveraged Hortonworks Hadoop distribution and other open source technologies to meet our project needs. This session will cover data lake architecture, security, and use cases.
Securing and governing a multi-tenant data lake within the financial industryDataWorks Summit
Standard Bank South Africa is a Hortonworks client, with several multi-node clusters hosting Hortonworks Data Platform (HDP) and Hortonworks Data Flow (HDF). This presentation will discuss the technical detail of implementing security, governance and multi-tenancy on a "Data Lake" within the finance industry. The talk will address the team's experiences, challenges, failures and learnings that we took away from this behemoth of an adventure.
After introducing Standard Bank and the Hadoop admin team, the presentation will describe the security and governance journey Standard Bank has undergone since the project's inception in 2015, as well as the roadmap for the future ahead.
Presentation structure:
1. Team introduction with background information
2. Environment overview (Where we are - Current)
-----Security
---------Authentication through Kerberos and LDAP/ AD
---------Authorization through Ranger and Centrify
---------Transparent Data Encryption (TDE) at rest
-----Governance
---------Centralized auditing
---------Ranger policies and data steward ownership
-----Multi-Tenancy
---------Data lake Vs. data analytics platform
---------Edge nodes Vs. API framework through Knox
3. How did we get to this stage? (Past)
-----Challenges faced (Kerberos, AD integration, SSL)
-----How we overcame these challenges
4. Future challenges we foresee (Future)
-----How we are planning to prepare for them
Speakers
Ian Pillay, Hadoop Administrator, Standard Bank
Brad Smith, Hadoop Administrator, Standard Bank
Deep-dive into Microservices Patterns with Replication and Stream Analytics
Target Audience: Microservices and Data Architects
This is an informational presentation about microservices event patterns, GoldenGate event replication, and event stream processing with Oracle Stream Analytics. This session will discuss some of the challenges of working with data in a microservices architecture (MA), and how the emerging concept of a “Data Mesh” can go hand-in-hand to improve microservices-based data management patterns. You may have already heard about common microservices patterns like CQRS, Saga, Event Sourcing and Transaction Outbox; we’ll share how GoldenGate can simplify these patterns while also bringing stronger data consistency to your microservice integrations. We will also discuss how complex event processing (CEP) and stream processing can be used with event-driven MA for operational and analytical use cases.
Business pressures for modernization and digital transformation drive demand for rapid, flexible DevOps, which microservices address, but also for data-driven Analytics, Machine Learning and Data Lakes which is where data management tech really shines. Join us for this presentation where we take a deep look at the intersection of microservice design patterns and modern data integration tech.
Learn about data lifecycle best practices in the AWS Cloud, so you can optimize performance and lower the costs of data ingestion, staging, storage, cleansing, analytics and visualization, and archiving.
From BI Developer to Data Engineer with Oracle Analytics Cloud, Data LakeRittman Analytics
In this session, we'll look at the role of the data engineer in designing, provisioning, and enabling an Oracle Cloud data lake using Oracle Analytics Cloud Data Lake Edition. We’ll also examine the use of data flow and data pipeline authoring tools and how machine learning and AI can be applied to this task. Furthermore, we’ll explore connecting to database and SaaS sources along with sources of external data via Oracle Data-as-a-Service. Finally we’ll delve into how traditional Oracle Analytics developers can transition their skills into this role and start working as data engineers on Oracle Public Cloud data lake projects.
The convergence of reporting and interactive BI on HadoopDataWorks Summit
Since the early days of Hive, SQL on Hadoop has evolved from being a SQL wrapper on top of MapReduce to a viable replacement for the traditional EDW. In the meantime, while SQL-on-Hadoop vendors were busy adding enterprise capabilities and comparing their TPC-DS prowess against Hive, a niche industry emerged on the side for OLAP (a.k.a. “Interactive BI”) on Hadoop data. Unlike general-purpose SQL-on-Hadoop engines, which deal with the multiple aspects of warehousing, including reporting, OLAP-on-Hadoop engines focus almost exclusively on answering OLAP queries fast by using implementation techniques that had not been part of the SQL-on-Hadoop toolbox so far.
But SQL-on-Hadoop engines are not standing still. After having made huge progress in catching up to traditional EDWs for reporting workloads, SQL-on-Hadoop engines are now setting their sights on interactive BI. This is great news for enterprises. As the line between reporting and OLAP gets blurred, enterprises can now start considering using a single engine for both reporting and Interactive BI on their Hadoop data, as opposed to having to host, manage, and license two separate products.
Can a single engine satisfy both your reporting and Interactive BI needs? This may be a hard question to answer. Vendors use inconsistent terminology to describe their products and make ambitious and sometimes conflicting claims. This makes it very hard for enterprises to compare products, let alone decide which is the product that best matches their needs.
In this presentation, we’ll provide an overview of the different approaches to OLAP on Hadoop, and explain the key technologies behind each of them. We’ll use consistent terminology to describe what you get from multiple proprietary and open source products and outline advantages and disadvantages. You’ll come out equipped with the knowledge you need to read past marketing and sales pitches. You’ll be able to compare products and make an informed decision on whether a single engine for both reporting and Interactive BI on Hadoop is right for you.
Speaker
Gustavo Arocena, Big Data Architect, IBM
Adding structure to your streaming pipelines: moving from Spark streaming to ...DataWorks Summit
How do you go from a strictly typed object-based streaming pipeline with simple operations to a structured streaming pipeline with higher order complex relational operations? This is what the Data Engineering team did at GoPro to scale up the development of streaming pipelines for the rapidly growing number of devices and applications.
When big data frameworks such as Hadoop first came to exist, developers were happy because we could finally process large amounts of data without writing complex multi-threaded code or worse yet writing complicated distributed code. Unfortunately, only very simple operations were available such as map and reduce. Almost immediately, higher level operations were desired similar to relational operations. And so Hive and dozens (hundreds?) of SQL-based big data tools became available for more developer-efficient batch processing of massive amounts of data.
In recent years, big data has moved from batch processing to stream-based processing since no one wants to wait hours or days to gain insights. Dozens of stream processing frameworks exist today and the same trend that occurred in the batch-based big data processing realm has taken place in the streaming world, so that nearly every streaming framework now supports higher level relational operations.
In this talk, we will discuss in a very hands-on manner how the streaming data pipelines for GoPro devices and apps have moved from the original Spark streaming with its simple RDD-based operations in Spark 1.x to Spark's structured streaming with its higher level relational operations in Spark 2.x. We will talk about the differences, advantages, and necessary pain points that must be addressed in order to scale relational-based streaming pipelines for massive IoT streams. We will also talk about moving from “hand built” Hadoop/Spark clusters running in the cloud to using a Spark-based cloud service. DAVID WINTERS, Big Data Architect, GoPro and HAO ZOU, Senior Software Engineer, GoPro
Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?SnapLogic
Companies collect more data but struggle with how to glean the best insights. Use of Machine Learning also needs power data integration.
In this presentation, Janet Jaiswal, SnapLogic's VP of product marketing, reviews key strategies and technologies to deliver intelligent data via self-service ML models.
To learn more, visit https://www.snaplogic.com
apidays LIVE Paris - Data with a mission: a COVID-19 API case study by Matt M...apidays
apidays LIVE Paris - Responding to the New Normal with APIs for Business, People and Society
December 8, 9 & 10, 2020
Data with a mission: a COVID-19 API case study
Matt McLarty, Global Leader of API Strategy at MuleSoft
Sanjna Verma, Product Manager at Salesforce
apidays LIVE Australia 2020 - Data with a Mission by Matt McLarty apidays
apidays LIVE Australia 2020 - Building Business Ecosystems
Data with a Mission: A COVID-19 API Case Study
Matt McLarty, Global Leader, API Strategy & Sanjna Verma, Product Manager at MuleSoft
apidays LIVE New York 2021 - Simplify Open Policy Agent with Styra DAS by Tim...apidays
apidays LIVE New York 2021 - API-driven Regulations for Finance, Insurance, and Healthcare
July 28 & 29, 2021
Simplify Open Policy Agent with Styra DAS
Tim Hinrichs, Co-Founder & CTO at Styra
DEM07 Best Practices for Monitoring Amazon ECS Containers Launched with FargateAmazon Web Services
Containers and other forms of dynamic infrastructure can prove challenging to monitor. How do you define “normal” when your infrastructure is intentionally in motion and changing every minute, or when there are no hosts to monitor at all? Join us as we share proven strategies for monitoring your containerized infrastructure on AWS, Amazon ECS, and AWS Fargate. This session is brought to you by AWS Partner, Datadog.
Motadata - Unified Product Suite for IT Operations and Big Data Analyticsnovsela
Motadata is a unified IT Infrastructure Monitoring, Log & Flow Management and IT Service Management Platform, offering operational insights into your IT infrastructure and its performance and is designed to identify & resolve complex problems faster that ensures 100% uptime of all business critical components. Motadata enables you to make more informed business decisions by offering complete visibility into the health and key performance indicators (KPIs) of IT services. It helps in reducing CAPEX, offers Agility to resolve issues faster, is compatible in a hybrid ecosystem, and offers ease of integration with existing and future platforms.
In summary, with Motadata, Mindarray Systems offers the perfect solution needed to confidently handle the challenges of today’s increasingly complex business operations and IT infrastructure management.
For more information: nov.sela@gmail.com
Pivotal Big Data Suite: A Technical OverviewVMware Tanzu
How and why are companies like Uber, Netflix and Airbnb so successful, what you need to in order to become successful in the same way that they are and how Pivotal can help you with that.
Speaker: Les Klein, EMEA CTO Data, Pivotal
How Trek10 Uses Datadog's Distributed Tracing to Improve AWS Lambda Projects ...Amazon Web Services
Tracing is always a challenge, no matter what your architecture is. Creating an application with serverless functions, such as with AWS Lambda, provides agility and scalability to your application, but it also creates an added challenge for code tracing. In this session, we review Datadog's distributed tracing capabilities and how Trek10 uses those capabilities to improve its customers’ applications. Learn how to use AWS X-Ray in a serverless environment. Also, learn strategies for working with traces and logs that explain application errors. Finally, learn how Trek10 uses AWS X-Ray with Datadog to measure and improve its applications' performance. This session is brought to you by AWS partner, Datadog.
Why You Need Manageability Now More than Ever and How to Get ItGustavo Rene Antunez
Whether you are operating in a completely on-premises environment or have some kind of hybrid cloud setup, you need to be able to clearly monitor and manage your entire organization in one single, unified structure. In this session learn how IOUG’s volunteer team decided to review Oracle Management Cloud Services to see if this “single pane of glass” was up to the challenge of providing the information data professionals need to serve their organization. Come and see how to put the pieces together, illustrated with real examples from Oracle Public Cloud services.
Glassbeam: Ad-hoc Analytics on Internet of Complex Things with Apache Cassand...DataStax Academy
Learn how Apache Cassandra can be paired with Apache Spark to build a powerful OLAP solution for analyzing large scale operational data from Internet of Complex Things. Cassandra provides a scalable, flexible, and highly available data store. Spark provides a fast and scalable analytics framework. The two can be combined to build a powerful and flexible solution for analyzing IoT data. We will discuss what kind of analytics Cassandra can handle by itself and when Spark is required. We will also share some of the challenges and trade-offs that are important to know.
Video | https://youtu.be/V5ukRSqcmYY
Event | https://www.alluxio.io/data-orchestration-summit-2020/
Talk Link | https://www.alluxio.io/resources/videos/unified-data-access-with-gimel/
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQueryMárton Kodok
Teaser: provide developers a new way of understanding advanced analytics and choosing the right cloud architecture
The new buzzword is #serverless, as there are many great services that helps us abstract away the complexity associated with managing servers. In this session we will see how serverless helps on large data analytics backends.
We will see how to architect for Cloud and implement into an existing project components that will take us into the #serverless architecture that will ingest our streaming data, run advanced analytics on petabytes of data using BigQuery on Google Cloud Platform - all this next to an existing stack, without being forced to reengineer our app.
BigQuery enables super-fast, SQL/Javascript queries against petabytes of data using the processing power of Google’s infrastructure. We will cover its core features, SQL 2011 standard, working with streaming inserts, User Defined Functions written in Javascript, reference external JS libraries, and several use cases for everyday backend developer: funnel analytics, email heatmap, custom data processing, building dashboards, extracting data using JS functions, emitting rows based on business logic.
Similar to Dataworks | 2018-06-20 | Gimel data platform (20)
Welcome to the first live UiPath Community Day Dubai! Join us for this unique occasion to meet our local and global UiPath Community and leaders. You will get a full view of the MEA region's automation landscape and the AI Powered automation technology capabilities of UiPath. Also, hosted by our local partners Marc Ellis, you will enjoy a half-day packed with industry insights and automation peers networking.
📕 Curious on our agenda? Wait no more!
10:00 Welcome note - UiPath Community in Dubai
Lovely Sinha, UiPath Community Chapter Leader, UiPath MVPx3, Hyper-automation Consultant, First Abu Dhabi Bank
10:20 A UiPath cross-region MEA overview
Ashraf El Zarka, VP and Managing Director MEA, UiPath
10:35: Customer Success Journey
Deepthi Deepak, Head of Intelligent Automation CoE, First Abu Dhabi Bank
11:15 The UiPath approach to GenAI with our three principles: improve accuracy, supercharge productivity, and automate more
Boris Krumrey, Global VP, Automation Innovation, UiPath
12:15 To discover how Marc Ellis leverages tech-driven solutions in recruitment and managed services.
Brendan Lingam, Director of Sales and Business Development, Marc Ellis
Enhancing Performance with Globus and the Science DMZGlobus
ESnet has led the way in helping national facilities—and many other institutions in the research community—configure Science DMZs and troubleshoot network issues to maximize data transfer performance. In this talk we will present a summary of approaches and tips for getting the most out of your network infrastructure using Globus Connect Server.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
5. PayPal Big Data Platform
5
160+ PB Data
75,000+ YARN
jobs/day
One of the largest
Aerospike,
Teradata,
Hortonworks and
Oracle installations
Compute supported:
MR, Pig, Hive, Spark,
Beam
13 prod clusters, 12 non-
prod clusters
GPU co-located with
Hadoop
6. 6
Developer Data scientist Analyst Operator
Gimel SDK Notebooks
PCatalog Data API
Infrastructure services leveraged for elasticity and redundancy
Multi-DC Public cloudPredictive resource allocation
Logging
Monitoring
Alerting
Security
Application
Lifecycle
Management
Compute
Frameworkand
APIs
GimelData
Platform
User
Experience
andAccess
R Studio BI tools
22. Q&A
G i t h u b : h t t p : / / g i m e l . i o
Tr y i t y o u r s e l f : h t t p : / / t r y. g i m e l . i o
S l a c k : h t t p s : / / g i m e l - d e v. s l a c k . c o m
G o o g l e G r o u p s : h t t p s : / / g r o u p s . g o o g l e . c o m / d / f o r u m / g i m e l - d e v
22