Unified Data Catalog - Recommendations powered by Apache Spark & Neo4j

•Download as PPTX, PDF•

0 likes•689 views

This document discusses using recommendation systems in Unified Data Catalog (UDC) to help users discover relevant datasets. It outlines how recommendation engines have benefited Amazon and Netflix by generating personalized suggestions. The architecture uses Neo4J and Spark to build a graph of user, dataset and metadata relationships to power recommendations. Future plans include expanding the graph with additional data sources to improve recommendations and enable new use cases around privacy, compliance and data lineage.

Technology

Recommendation System on UDC –
Powered by Neo4J & Spark
Harsh Bhimani
Deepak Chandramouli
https://youtu.be/1tdgxJJkbm8

AGENDA  What is Unified Data Catalog
 Why Recommendations in UDC
 Recommendations – Quick Peek
 Architecture
 Future
 Questions

Meta Data Discovery - across the Enterprise

A DataSet – Business and Technical Metadata

UDC
MySQL +
Elastic +
Graph*
{REST API
Endpoints}
Discovery
Services
Metadata
Services
Metadata
Database
UDC UI
PIT
Many more stores…

UDC - Growth Story
~1000 Users
30+ Global Locations
100+ Delivery Orgs
250+ Data Stores
1.7+ M datasets
Config Data Catalog
for Analytics
Enterprise
Data Catalog20 Months

Why Use Recommendation Systems?
• Uses recommendations as targeted marketing
throughout its website
• The store radically changes based on the customer’s
interests
According to McKinsey &
Company, 35% of
Amazon.com’s revenue is
generated by its
recommendation engine.
https://www.mckinsey.com/industries/retail/our-insights/how-retailers-can-keep-up-with-
consumers
Amazon
• Uses recommendations to generate top 10 titles for
user households.
• Uses customer feedback as a signal in their engine so
that recommendations get more personalized.
According to McKinsey &
Company, 75% of what users
watch on Netflix come from
product recommendations.
https://www.mckinsey.com/industries/retail/our-insights/how-retailers-can-keep-up-with-
consumers
Netflix
Let’s look at two use cases to understand the power of recommendations.

How Can Recommendation Systems Help UDC?
UDC User Base Long Tail Effect
user + dataset details => personalize recommendations.
• Data Scientists/Analysts
• Developers
• Administrators
• *GDPR/Privacy/Security

Recommended Datasets – New User [”cold start” problem]

Recommendation Systems
Matrix Based Graph Based
Cold start Problem ! Cold start Problem - solved

neo4j & Spark
• Easy Bootstrap
• Cypher
• Spark APIs
• Rich Graph Algorithms
• Nice Visualization
• APOCs
• Distributed Processing
• Spark-ML + Graphx
• GIMEL + Spark - Data
access simplified

19
SSO Org. Data
UDC
Metadata
Cleaning & Creating
API Call from UDC UI
User-Manager Rel.
UDC Data
NB Logs
Building the Graph Recommendations

What’s next?
Expanding Relationships
• Workday –Org Structure
• Identity / LDAP – Access controls
• Databases – Owners, Users, Query Logs
• Query Logs – Lineage
• Wiki – Documents, Mentions
• JIRA – Issues, Mentions
• Slack – Chats, Threads
Expanding Use cases
• GDPR
• Compliance
• Privacy

What's hot

Oracle Stream Analytics - Developer Introduction

Jeffrey T. Pollock

Standard Bank South Africa is a Hortonworks client, with several multi-node clusters hosting Hortonworks Data Platform (HDP) and Hortonworks Data Flow (HDF). This presentation will discuss the technical detail of implementing security, governance and multi-tenancy on a "Data Lake" within the finance industry. The talk will address the team's experiences, challenges, failures and learnings that we took away from this behemoth of an adventure. After introducing Standard Bank and the Hadoop admin team, the presentation will describe the security and governance journey Standard Bank has undergone since the project's inception in 2015, as well as the roadmap for the future ahead. Presentation structure: 1. Team introduction with background information 2. Environment overview (Where we are - Current) -----Security ---------Authentication through Kerberos and LDAP/ AD ---------Authorization through Ranger and Centrify ---------Transparent Data Encryption (TDE) at rest -----Governance ---------Centralized auditing ---------Ranger policies and data steward ownership -----Multi-Tenancy ---------Data lake Vs. data analytics platform ---------Edge nodes Vs. API framework through Knox 3. How did we get to this stage? (Past) -----Challenges faced (Kerberos, AD integration, SSL) -----How we overcame these challenges 4. Future challenges we foresee (Future) -----How we are planning to prepare for them Speakers Ian Pillay, Hadoop Administrator, Standard Bank Brad Smith, Hadoop Administrator, Standard Bank

Securing and governing a multi-tenant data lake within the financial industry

DataWorks Summit

Machine learning development brings many new complexities beyond the traditional software development lifecycle. Unlike traditional software development, ML developers want to try multiple algorithms, tools and parameters to get the best results, and they need to track this information to reproduce work. In addition, developers need to use many distinct systems to productionize models. In this talk, learn how to operationalize ML across the full lifecycle with Databricks Machine Learning.

Learn to Use Databricks for the Full ML Lifecycle

Databricks

Data teams are faced with a variety of tasks when migrating Hadoop-based platforms to Databricks. A common pitfall happens during the migration step where often overlooked access control policies can block adoption. This session will focus on the best practices to migrate and modernize Hadoop-based policies to govern data access (such as those in Apache Ranger or Apache Sentry). Data architects must consider new, fine-grained access control requirements when migrating from Hadoop architectures to Databricks in order to deliver secure access to as many data sets and data consumers as possible. This session will provide guidance across open source, AWS, Azure and partner tools, such as Immuta, on how to scale existing Hadoop-based policies to dynamically support more classes of users, implement fine-grained access control and leverage automation to protect sensitive data while maximizing utility — without manual effort

Migrate and Modernize Hadoop-Based Security Policies for Databricks

Databricks

Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...

Databricks

This presentation is from a recorded webinar with 451 Research analyst and thought leader Matt Aslett for a discussion about the growing importance of the right data management best practices and techniques for delivering on the promise of big data in the enterprise. Matt reviews the big data landscape, how the data lake complements and competes with the data warehouse, and key takeaways as you move from big data test and development environments to production. You can watch the webinar here: http://bit.ly/25ShiQu

Big Data Management: What's New, What's Different, and What You Need To Know

SnapLogic

The time for enterprises to gain market advantage through Artificial Intelligence is now. Already many AI-enabled advances are transforming business processes and customer experiences, but the vast majority of AI-enhanced use cases are still to be discovered, developed, and deployed. In order to discover and capture the value available through deployed AI, new deep learning techniques are the focus of feverish research and development in academia and business. However, even successful AI experiments are often never deployed to business operations, resulting in wasted effort, time, and money, and leaving businesses dangerously exposed to competitors that have integrated AI into their ongoing operations. Experimentation with AI is essential to realizing the promise of AI, but enterprises face substantial risks that their experiments with AI, even successful ones, will do nothing to improve their business outcomes. We present a framework, inspired by DevOps practices used by software engineers to continuously incorporate new ideas and improvements into applications, that de-risks investments in AI by providing a reliable channel for pipelining successful AI experiments and development into continuously deployed and monitored operational analytics. Speaker Nick Switanek, Marketing Director of Artificial Intelligence, Teradata

Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...

DataWorks Summit

2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demo

Databricks

2017 OpenWorld Keynote for Data Integration

Jeffrey T. Pollock

Despite the increased availability of ready-to-use generic tools, more and more enterprises are deciding to build in-house data platforms. This practice, common for some time in research labs and digital native companies, is now making its waves across large enterprises that traditionally used proprietary solutions and outsourced most of their IT. The availability of large volumes of data, coupled with more and more complex analytical use cases driven by innovations in data science have yielded these traditional and on premise architectures to become obsolete in favor of cloud architectures powered by open source technologies. The idea of building an in-house platform at a larger enterprise comes with many challenges of its own: Build an Architecture that combines the best elements of data lakes and data warehouses to accommodate all kinds from BI to ML use cases. The need to interoperate with all the company’s data and technology, including legacy systems. Cultural transformation, including a commitment to adopt agile processes and data driven approaches. This presentation describes a success story on building a Lakehouse in an enterprise such as LIDL, a successful chain of grocery stores operating in 32 countries worldwide. We will dive into the cloud-based architecture for batch and streaming workloads based on many different source systems of the enterprise and how we applied security on architecture and data. We will detail the creation of a curated Data Lake comprising several layers from a raw ingesting layer up to a layer that presents cleansed and enriched data to the business units as a kind of Data Marketplace. A lot of focus and effort went into building a semantic Data Lake as a sustainable and easy to use basis for the Lakehouse as opposed to just dumping source data into it. The first use case being applied to the Lakehouse is the Lidl Plus Loyalty Program. It is already deployed to production in 26 countries with more than 30 millions of customers’ data being analyzed on a daily basis. In parallel to productionizing the Lakehouse, a cultural and organizational change process was undertaken to get all involved units to buy into the new data driven approach.

Phar Data Platform: From the Lakehouse Paradigm to the Reality

Databricks

Clinical genomic analytics pipelines using Databricks and the Delta Lake for the benefit of loading individual reads from raw sequencing or base-call files have significant advantages over more traditional methods. Analysis pipelines that perform genomic mapping to purpose-built reference data artifacts persisted to tables allows for enhanced performance that is magnitudes greater than previous mapping methods. These scalable, reproducible, and potentially open sourced methods have the ability to transform bioinformatics and R&D data management / governance.

Managing R&D Data on Parallel Compute Infrastructure

Databricks

Intro to Delta Lake

Databricks

PubMatic is a leading advertisement technology company that processes 500 billion transactions (50 terabytes of data) per day in real-time and batch processing pipeline on a 900-node cluster to power highly efficient machine learning algorithms, provide real time feedback to ad-server for optimization and provide in depth insights on customer inventory and audience. At PubMatic, scaling with ever growing volume has always been the biggest challenge; we have been optimizing our technology stack for performance and costs. Another challenge is to support the demand for variety reports and analytics by customers and internal stakeholders. Writing custom jobs to provide analytics leads to repetitive efforts and redundancy of business logic in many different jobs. To solve the above problems, we built a platform that allows creating configuration driven data processing pipeline with high re-usability of business functions. It is also extensible to utilize cutting-edge technologies in the ever-changing big data ecosystem. This platform enables our development teams to build a robust batch data processing pipeline to power analytics dashboards. It also empowers novice users to provide a configuration with fact and dimensions to generate ad-hoc reports in a single data processing job. Framework intelligently identifies and re-uses existing business functions based on user inputs. It also provides an abstraction layer that keeps core business logic un-affected by the any technology changes. This framework is currently powered by Spark, but it can be easily configured with other technologies. Framework significantly improved time to develop data processing jobs from weeks to few days, it simplified unit testing and QA automation, as well as provided simpler interfaces to the customers and internal stakeholders to generate custom reports. Speaker Kunal Umrigar, Sr. Director Engineering Big Data & Analytics, PubMatic

Highly configurable and extensible data processing framework at PubMatic

DataWorks Summit

On the Radar: SnapLogic

SnapLogic

Analysis of Major Trends in Big Data Analytics

DataWorks Summit/Hadoop Summit

Semantic segmentation is the classification of every pixel in an image/video. The segmentation partitions a digital image into multiple objects to simplify/change the representation of the image into something that is more meaningful and easier to analyze [1][2]. The technique has a wide variety of applications ranging from perception in autonomous driving scenarios to cancer cell segmentation for medical diagnosis. Exponential growth in the datasets that require such segmentation is driven by improvements in the accuracy and quality of the sensors generating the data extending to 3D point cloud data. This growth is further compounded by exponential advances in cloud technologies enabling the storage and compute available for such applications. The need for semantically segmented datasets is a key requirement to improve the accuracy of inference engines that are built upon them. Streamlining the accuracy and efficiency of these systems directly affects the value of the business outcome for organizations that are developing such functionalities as a part of their AI strategy. This presentation details workflows for labeling, preprocessing, modeling, and evaluating performance/accuracy. Scientists and engineers leverage domain-specific features/tools that support the entire workflow from labeling the ground truth, handling data from a wide variety of sources/formats, developing models and finally deploying these models. Users can scale their deployments optimally on GPU-based cloud infrastructure to build accelerated training and inference pipelines while working with big datasets. These environments are optimized for engineers to develop such functionality with ease and then scale against large datasets with Spark-based clusters on the cloud.

Processing Large Datasets for ADAS Applications using Apache Spark

Databricks

Comcast's Streaming Data platform comprises a variety of ingest, transformation, and storage services in the public cloud. Peer-reviewed Apache Avro schemas support end-to-end data governance. We have previously reported (DataWorks Summit 2017) on how we extended Atlas with custom entity and process types for discovery and lineage in the AWS public cloud. Custom lambda functions notify Atlas of creation of new entities and new lineage links via asynchronous kafka messaging. Recently we were presented the challenge of providing integrated data discovery and lineage across our public cloud datasources and on-prem datasources, both Hadoop-based and traditional data warehouses and RDBMSs. Can Apache Atlas meet this challenge? A resounding yes! This talk will present our federated architecture, with Atlas providing SQL-like, free-text, and graph search across select metadata from all on-prem and public cloud data sources in our purview. Lightweight, custom connectors/bridges identify metadata/lineage changes in underlying sources and publish them to Atlas via the asynchronous API. A portal layer provides Atlas query access and a federation of UIs. Once data of interest is identified via Atlas queries, interfaces specific to underlying sources may be used for special-purpose metadata mining. While metadata repositories for data discovery and lineage abound, none of them have built-in connectors and listeners for the entire complement of data sources that Comcast and many other large enterprises use to support their business needs. In-house-built solutions typically underestimate the cost of development and maintenance and often suffer from architecture-by-accretion. Atlas' commitment to extensibility, built-in provision of typed, free-text, and graph search, and REST and asynchronous APIs, position it uniquely in the build-vs-buy sweet spot.

An architecture for federated data discovery and lineage over on-prem datasou...

DataWorks Summit

Choosing the Right Open Source Database

All Things Open

In this webinar, learn how SnapLogic and Amazon Web Services helped Earth Networks create a responsive, self-service cloud for data integration, preparation and analytics. We also discuss how Earth Networks gained faster data insights using SnapLogic’s Amazon Redshift data integration and other connectors to quickly integrate, transfer and analyze data from multiple applications. To learn more, visit: www.snaplogic.com/redshift

Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...

SnapLogic

"Comcast has made a concerted effort to transform itself from a cable/ISP company to a technology company. Data-driven decision making is at the heart of this transformation, and we use data to understand how customers interact with our products, and we see data as the most truthful representation of the voice of our customer. My team, Product Analytics & behavior science (PABS) team plays the role as interpreter, transforming data into consumable insights. The X1 entertainment operating system, is one of the largest video streaming platforms in the world, and our customers consume more than a billion hours of content a week on X1. Our team consumes X1 telemetry at a rate of more than 25TBs of data per day and uses this data to inform our product teams members about the performance of and engagement with the platform. We also use this data to research customer behaviors to help better inform our product team members about areas of opportunity in our products, which range from fixing bugs to creating new features. To power these insights, we need to have a reliable real-time data pipelines to deliver these insights, and we need our data scientists and data engineers to be able to quickly and efficiently be able to develop and commit new code to ensure we can measure new features the product teams are developing. To do this in an environment at this scale, we have been using Databricks, and Databricks delta to gain operational efficiencies, optimization and cost savings. Some of the features from delta that we took advantage of to achieve the desired levels of efficiencies, optimization and cost savings are: · Distributed writes to s3 (essentially eliminating 500 errors) · s3 log with fast reads and ACID transactions (massive increases in s3 scans/reads, and enabling consistent views of the bucket/table) · Vacuum · Pptimize (which has allowed us to reduce a 640 node job to 40, and massively increase efficiencies of our clusters as well as our DS/DE’s)"

Building Sessionization Pipeline at Scale with Databricks Delta

Databricks

What's hot (20)

Oracle Stream Analytics - Developer Introduction

Securing and governing a multi-tenant data lake within the financial industry

Learn to Use Databricks for the Full ML Lifecycle

Migrate and Modernize Hadoop-Based Security Policies for Databricks

Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...

Big Data Management: What's New, What's Different, and What You Need To Know

Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...

2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demo

2017 OpenWorld Keynote for Data Integration

Phar Data Platform: From the Lakehouse Paradigm to the Reality

Managing R&D Data on Parallel Compute Infrastructure

Intro to Delta Lake

Highly configurable and extensible data processing framework at PubMatic

On the Radar: SnapLogic

Analysis of Major Trends in Big Data Analytics

Processing Large Datasets for ADAS Applications using Apache Spark

An architecture for federated data discovery and lineage over on-prem datasou...

Choosing the Right Open Source Database

Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...

Building Sessionization Pipeline at Scale with Databricks Delta

Similar to Unified Data Catalog - Recommendations powered by Apache Spark & Neo4j

Thought leadership Oct2015 selfserve

Ron Krzoska

At hellorider (Fietsenwinkel.nl) we have recently implemented a new BI tool. We went trough the process of choosing the infrastructure, building the database and dashboards and getting our colleagues to work with it. In this presentation we will explain: - how we chose our BI infrastructure - how we got our collegues to understand and accept this change - what we do to keep our collegues involved in using the reports - how we use the features of the BI tool in a smart way to get the most out of it

Renewing the BI infrastructure at Hellorider - Big Data Expo 2019

webwinkelvakdag

About CDAP

Cask Data

Empowering Customers with Personalized Insights

Cloudera, Inc.

Watch full webinar here: https://bit.ly/3zVJRRf According to Dresner Advisory’s 2020 Self-Service Business Intelligence Market Study, 62% of the responding organizations say self-service BI is critical for their business. If we look deeper into the need for today’s self-service BI, it’s beyond some Executives and Business Users being enabled by IT for self-service dashboarding or report generation. Predictive analytics, self-service data preparation, collaborative data exploration are all different facets of new generation self-service BI. While democratization of data for self-service BI holds many benefits, strict data governance becomes increasingly important alongside. In this session we will discuss: - The latest trends and scopes of self-service BI - The role of logical data fabric in self-service BI - How Denodo enables self-service BI for a wide range of users - Customer case study on self-service BI

Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI

Denodo

Build and Manage Hadoop & Oracle NoSQL DB Solutions- Impetus Webinar

Impetus Technologies

Oracle Analytics Cloud: connect; prepare; explore; share. Liberate all data and connect to more than 50 different data sources. Powerful tools for auditable and traceable data blending, wrangling, cleansing, & modeling. Intuitive and rich exploration with self-service data visualization. Build collective intelligence by collaborating with peers and socialize insights across the organization or the world.

Oracle Analytics Cloud

Joseph Alaimo Jr

Whether to take data ingestion cycles off the ETL tool and the data warehouse or to facilitate competitive Data Science and building algorithms in the organization, the data lake – a place for unmodeled and vast data – will be provisioned widely in 2020. Though it doesn’t have to be complicated, the data lake has a few key design points that are critical, and it does need to follow some principles for success. Avoid building the data swamp, but not the data lake! The tool ecosystem is building up around the data lake and soon many will have a robust lake and data warehouse. We will discuss policy to keep them straight, send data to its best platform, and keep users’ confidence up in their data platforms. Data lakes will be built in cloud object storage. We’ll discuss the options there as well. Get this data point for your data lake journey.

When and How Data Lakes Fit into a Modern Data Architecture

DATAVERSITY

With Oracle Data Visualization Cloud Service, your business users can perform self-service analytics, spot patterns, trends, correlations, and construct visual data stories for greater insight into how your product, service, or organization is performing. In this webinar, we demonstrated how easily users can explore their data in new and different ways through stunning visualizations automatically, promoting self-service discovery. Discussion included: -In-depth review of Oracle Data Visualization Cloud Service -Connecting different data sets like HCM, ERP, Sales Cloud and more -Mobile and security -Demo taking a real-world business use case from end to end

How to Empower Your Business Users with Oracle Data Visualization

Perficient, Inc.

The speed at which you can extract insights from your data is increasingly a competitive edge for your business. Data and analytics have to be at lightning fast speeds to seriously impact your user acquisition. Join this webinar featuring Forrester analyst Noel Yuhanna and Leena Joshi, VP Product Marketing at Redis Labs to learn how you can glean insights faster with new open source data processing frameworks like Spark and Redis. In this webinar you will learn: * Why analytics has to run at the real time speed of business * How this can be achieved with next generation Big Data tools * How data structures can optimize your hybrid transaction-analytics processing scenarios

Running Analytics at the Speed of Your Business

Redis Labs

Implementing hyperion epm bi as a system - delivering on world class analytics

Alithya

This presentation reviews the key methodologies that all members of your team should consider, before planning a migration from Oracle to Postgres including: • Prioritizing the right application or project for your first Oracle migration • Planning a well-defined, phased migration process to minimize risk and increase time to value • Handling common concerns and pitfalls related to a migration project • Leveraging resources before, during, and after your migration • Becoming independent from an Oracle database – without sacrificing performance With EDB Postgres’ database compatibility for Oracle, it is easy to migrate from your existing Oracle databases. The compatibility feature set includes compatibility for PL/SQL, Oracle’s SQL syntax, and built in SQL functions. This means that many applications can be easily migrated over to EDB Postgres. It also allows you to continue using your existing Oracle skills. For more information please contact us at sales@enterprisedb.com

Key Methodologies for Migrating from Oracle to Postgres

EDB

Analyti x mapping manager product overview presentation

AnalytixDataServices

Two complementary trends are particularly strong in enterprise IT today: MongoDB itself, and the movement of infrastructure, platform, and software to as-a-service models. Being designed from the start to work in cloud deployments, MongoDB is a natural fit. Learn how your enterprise can create its own MongoDB service offering, combining the advantages of MongoDB and cloud for agile, nearly-instantaneous deployments. Ease your operations workload by centralizing your points for enforcement, standardize best policies, and enable elastic scalability. We will provide you with an enterprise planning outline which incorporates needs and value for stakeholders across operations, development, and business. We will cover accounting, chargeback integration, and quantification of benefits to the enterprise (such as standardizing best practices, creating elastic architecture, and reducing database maintenance costs).

Webinar: Enterprise Trends for Database-as-a-Service

MongoDB

ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture

DATAVERSITY

Advanced Use Cases for Analytics Breakout Session

Splunk

Day 02 sap_bi_overview_and_terminology

tovetrivel

Partner Enablement: Key Differentiators of Denodo Platform 6.0 for the Field

Denodo

API and Big Data Solution Patterns

WSO2

zData BI & Advanced Analytics Platform + 8 Week Pilot Programs

zData Inc.

Similar to Unified Data Catalog - Recommendations powered by Apache Spark & Neo4j (20)

Thought leadership Oct2015 selfserve

Renewing the BI infrastructure at Hellorider - Big Data Expo 2019

About CDAP

Empowering Customers with Personalized Insights

Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI

Build and Manage Hadoop & Oracle NoSQL DB Solutions- Impetus Webinar

Oracle Analytics Cloud

When and How Data Lakes Fit into a Modern Data Architecture

How to Empower Your Business Users with Oracle Data Visualization

Running Analytics at the Speed of Your Business

Implementing hyperion epm bi as a system - delivering on world class analytics

Key Methodologies for Migrating from Oracle to Postgres

Analyti x mapping manager product overview presentation

Webinar: Enterprise Trends for Database-as-a-Service

ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture

Advanced Use Cases for Analytics Breakout Session

Day 02 sap_bi_overview_and_terminology

Partner Enablement: Key Differentiators of Denodo Platform 6.0 for the Field

API and Big Data Solution Patterns

zData BI & Advanced Analytics Platform + 8 Week Pilot Programs

Recently uploaded

In this session, we will showcase how to revolutionize automated testing for your software, automation, and QA teams with UiPath Test Suite. In part 1 of UiPath test automation using UiPath Test Suite – developer series, we will cover, Software testing overview What is software testing Why software testing is required Typical test types and levels Continuous testing and challenges Introduction to UiPath Test Suite UiPath Test Suite family of products Speaker: Atul Trikha, Chief Technologist & Solutions Architect, Peraton and UiPath MVP Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP

UiPath Test Automation using UiPath Test Suite series, part 1

DianaGray10

Assuring Contact Center Experiences for Your Customers With ThousandEyes

ThousandEyes

I'm excited to share my latest predictions on how AI, robotics, and other technological advancements will reshape industries in the coming years. The slides explore the exponential growth of computational power, the future of AI and robotics, and their profound impact on various sectors. Why this matters: The success of new products and investments hinges on precise timing and foresight into emerging categories. This deck equips founders, VCs, and industry leaders with insights to align future products with upcoming tech developments. These insights enhance the ability to forecast industry trends, improve market timing, and predict competitor actions. Highlights: ▪ Exponential Growth in Compute: How $1000 will soon buy the computational power of a human brain ▪ Scaling of AI Models: The journey towards beyond human-scale models and intelligent edge computing ▪ Transformative Technologies: From advanced robotics and brain interfaces to automated healthcare and beyond ▪ Future of Work: How automation will redefine jobs and economic structures by 2040 With so many predictions presented here, some will inevitably be wrong or mistimed, especially with potential external disruptions. For instance, a conflict in Taiwan could severely impact global semiconductor production, affecting compute costs and related advancements. Nonetheless, these slides are intended to guide intuition on future technological trends.

Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl

Peter Udo Diehl

The standard Salesforce Approval process can be limiting in many ways, especially in complex scenarios. What if there was a way to implement very flexible approvals where one can use Apex code to make data updates in unrelated records, dynamically generate next steps details, and compute assignees on the fly? And still use UI-based configurations to implement concrete approval processes. In this session, we will share ideas behind such a solution and show a few lines of code to get you started.

Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder

CzechDreamin

Intrigued by why some of the world's largest companies (Netflix, Google, Cisco, Twitter, Uber etc) are using gRPC? In this demo based talk we delve into the world of gRPC in .Net, what it does and why we should use it. We compare the interface with both Rest and graphQL. We will show you how to implement grpc server-side in .net and in the web. Finally, I will show you how the tooling helps you deliver powerful interfaces and interact with them quickly and simply.

Demystifying gRPC in .Net by John Staveley

John Staveley

Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...

Thierry Lestable

When stars align: studies in data quality, knowledge graphs, and machine lear...

Elena Simperl

UiPath Test Automation using UiPath Test Suite series, part 3

DianaGray10

New customer? New industry? New cloud? New team? A lot to handle! How to ensure the success of the project? Start it well! I've created the 3 areas of focus at the beginning of the project that helped me in multiple roles (BA, PO, and Consultant). Learn from real-world experiences and discover how these insights can empower you to deliver unparalleled value to your customers right from the project's start.

Powerful Start- the Key to Project Success, Barbara Laskowska

CzechDreamin

De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...

Product School

IoT Analytics Company Presentation May 2024

IoTAnalytics

Join us as we dive into the latest updates to the UiPath Orchestrator API, including new limits and features for 2024. Discover how these changes can enhance your automation projects and streamline your workflows. 📚 Overview of UiPath Orchestrator API 🔧 Recent changes to API limits 🛠️ How to adapt to new limits 📋 Best practices for using the Orchestrator API efficiently ❓ Q&A session

Exploring UiPath Orchestrator API: updates and limits in 2024 🚀

DianaGray10

The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.

Search and Society: Reimagining Information Access for Radical Futures

Bhaskar Mitra

💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™: See how to accelerate model training and optimize model performance with active learning Learn about the latest enhancements to out-of-the-box document processing – with little to no training required Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath. Speakers: 👨‍🏫 Andras Palfi, Senior Product Manager, UiPath 👩‍🏫 Lenka Dulovicova, Product Program Manager, UiPath

Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...

UiPathCommunity

Discover the essentials of performance testing in the IT sector with our concise guide. Learn about various testing types such as load, stress, endurance, spike, scalability, and volume testing. Understand key performance metrics like response time, throughput, CPU and memory utilization, and error rate. Explore top tools like Apache JMeter, LoadRunner, Gatling, Neoload, and BlazeMeter. Gain insights into best practices for defining objectives, creating realistic scenarios, automating tests, and optimizing performance to ensure user satisfaction, reliability, scalability, and cost efficiency. Ideal for developers, QA engineers, and IT professionals. Visit Expeed Software for more information. https://expeed.com/

In-Depth Performance Testing Guide for IT Professionals

Expeed Software

Unlock the mysteries of successful Salesforce interviews in this insightful session hosted by Hugo Rosario (Salesforce Customer), a seasoned hiring manager that leads the Salesforce Department of multinational company with over 100 interviews under their belt. Step into the manager's chair and gain exclusive behind-the-scenes insights into what makes a Salesforce consultant stand out during the interview process. From deciphering the unspoken cues to mastering key strategies, we'll explore the intricacies of the interview process and provide practical tips for consultants looking to not only pass interviews but also thrive in their roles. Whether you're a seasoned professional or just starting your Salesforce journey, this session is your backstage pass to the secrets that hiring managers wish you knew.

Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...

CzechDreamin

Mission to Decommission: Importance of Decommissioning Products to Increase E...

Product School

Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application. In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics. Length: 30 minutes Session Overview ------------------------------------------- During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana: - What out-of-the-box solutions are available for real-time monitoring JMeter tests? - What are the benefits of integrating InfluxDB and Grafana into the load testing stack? - Which features are provided by Grafana? - Demonstration of InfluxDB and Grafana using a practice web application To view the webinar recording, go to: https://www.rttsweb.com/jmeter-integration-webinar

JMeter webinar - integration with InfluxDB and Grafana

RTTS

Welcome to UiPath Test Automation using UiPath Test Suite series part 2. In this session, we will cover API test automation along with a web automation demo. Topics covered: Test Automation introduction API Example of API automation Web automation demonstration Speaker Pathrudu Chintakayala, Associate Technical Architect @Yash and UiPath MVP Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP

UiPath Test Automation using UiPath Test Suite series, part 2

DianaGray10

IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx

Abida Shariff

Recently uploaded (20)

UiPath Test Automation using UiPath Test Suite series, part 1

Assuring Contact Center Experiences for Your Customers With ThousandEyes

Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl

Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder

Demystifying gRPC in .Net by John Staveley

Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...

When stars align: studies in data quality, knowledge graphs, and machine lear...

UiPath Test Automation using UiPath Test Suite series, part 3

Powerful Start- the Key to Project Success, Barbara Laskowska

De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...

IoT Analytics Company Presentation May 2024

Exploring UiPath Orchestrator API: updates and limits in 2024 🚀

Search and Society: Reimagining Information Access for Radical Futures

Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...

In-Depth Performance Testing Guide for IT Professionals

Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...

Mission to Decommission: Importance of Decommissioning Products to Increase E...

JMeter webinar - integration with InfluxDB and Grafana

UiPath Test Automation using UiPath Test Suite series, part 2

IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx

Unified Data Catalog - Recommendations powered by Apache Spark & Neo4j

1. Recommendation System on UDC – Powered by Neo4J & Spark Harsh Bhimani Deepak Chandramouli https://youtu.be/1tdgxJJkbm8

2. AGENDA  What is Unified Data Catalog  Why Recommendations in UDC  Recommendations – Quick Peek  Architecture  Future  Questions

3. UDC (Unified Data Catalog)

4. Meta Data Discovery - across the Enterprise

5. A DataSet – Business and Technical Metadata

6. UDC MySQL + Elastic + Graph* {REST API Endpoints} Discovery Services Metadata Services Metadata Database UDC UI PIT Many more stores…

7. Why Recommendation in UDC

8. UDC - Growth Story ~1000 Users 30+ Global Locations 100+ Delivery Orgs 250+ Data Stores 1.7+ M datasets Config Data Catalog for Analytics Enterprise Data Catalog20 Months

9. Why Use Recommendation Systems? • Uses recommendations as targeted marketing throughout its website • The store radically changes based on the customer’s interests According to McKinsey & Company, 35% of Amazon.com’s revenue is generated by its recommendation engine. https://www.mckinsey.com/industries/retail/our-insights/how-retailers-can-keep-up-with- consumers Amazon • Uses recommendations to generate top 10 titles for user households. • Uses customer feedback as a signal in their engine so that recommendations get more personalized. According to McKinsey & Company, 75% of what users watch on Netflix come from product recommendations. https://www.mckinsey.com/industries/retail/our-insights/how-retailers-can-keep-up-with- consumers Netflix Let’s look at two use cases to understand the power of recommendations.

10. How Can Recommendation Systems Help UDC? UDC User Base Long Tail Effect user + dataset details => personalize recommendations. • Data Scientists/Analysts • Developers • Administrators • *GDPR/Privacy/Security

11. Recommended Datasets – Existing Users

12. Recommendations – Challenges

13. Recommended Datasets – New User [”cold start” problem]

14. Recommendation Systems Matrix Based Graph Based Cold start Problem ! Cold start Problem - solved

15. Conceptual Model - Connected Components

16. neo4j & Spark • Easy Bootstrap • Cypher • Spark APIs • Rich Graph Algorithms • Nice Visualization • APOCs • Distributed Processing • Spark-ML + Graphx • GIMEL + Spark - Data access simplified

17. The Graph

18. UDC – Recommendations Architecture

19. 19 SSO Org. Data UDC Metadata Cleaning & Creating API Call from UDC UI User-Manager Rel. UDC Data NB Logs Building the Graph Recommendations

20. A B S T R A C T I O N L A Y E R

21. Group Affinity

22. User Views

23. Global Top Picks

24. Recommended Datasets – Recap

25. Going Beyond Recommendations

26. What’s next? Expanding Relationships • Workday –Org Structure • Identity / LDAP – Access controls • Databases – Owners, Users, Query Logs • Query Logs – Lineage • Wiki – Documents, Mentions • JIRA – Issues, Mentions • Slack – Chats, Threads Expanding Use cases • GDPR • Compliance • Privacy

27. Thank You!

28. Questions?

Unified Data Catalog - Recommendations powered by Apache Spark & Neo4j

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Unified Data Catalog - Recommendations powered by Apache Spark & Neo4j

Similar to Unified Data Catalog - Recommendations powered by Apache Spark & Neo4j (20)

Recently uploaded

Recently uploaded (20)

Unified Data Catalog - Recommendations powered by Apache Spark & Neo4j