The document discusses metadata and the need for a metadata discovery tool. It provides an overview of metadata, describes different types of users and their needs related to finding and understanding data. It also evaluates different architectural approaches for a metadata graph and considerations for security, guidelines, and other challenges in building such a tool.
Democratizing Data within your organization - Data DiscoveryMark Grover
n this talk, we talk about the challenges at scale in an organization like Lyft. We delve into data discovery as a challenge towards democratizing data within your organization. And, go in detail about the solution to solve the challenge of data discovery.
Talk on Data Discovery and Metadata by Mark Grover from July 2019.
Goes into detail of the problem, build/buy/adopt analysis and Lyft's solution - Amundsen, along with thoughts on the future.
Amundsen: From discovering to security datamarkgrover
Hear about how Lyft and Square are solving data discovery and data security challenges using a shared open source project - Amundsen.
Talk details and abstract:
https://www.datacouncil.ai/talks/amundsen-from-discovering-data-to-securing-data
Speaker: Philippe Mizrahi - Associate Product Manager - Lyft
Abstract: Philippe Mizrahi works on Lyft’s data discovery and metadata engine, Amundsen. With the help of a Neo4j graph database, Amundsen has improved Lyft’s data discovery by reducing time to discover data by 10x.
During this session, Philippe will dive deep into Amundsen’s use cases, impact, and architecture, which effectively combines a comprehensive knowledge graph based upon Neo4j, centralized metadata and other search ranking optimizations to discover data quickly.
Democratizing Data within your organization - Data DiscoveryMark Grover
n this talk, we talk about the challenges at scale in an organization like Lyft. We delve into data discovery as a challenge towards democratizing data within your organization. And, go in detail about the solution to solve the challenge of data discovery.
Talk on Data Discovery and Metadata by Mark Grover from July 2019.
Goes into detail of the problem, build/buy/adopt analysis and Lyft's solution - Amundsen, along with thoughts on the future.
Amundsen: From discovering to security datamarkgrover
Hear about how Lyft and Square are solving data discovery and data security challenges using a shared open source project - Amundsen.
Talk details and abstract:
https://www.datacouncil.ai/talks/amundsen-from-discovering-data-to-securing-data
Speaker: Philippe Mizrahi - Associate Product Manager - Lyft
Abstract: Philippe Mizrahi works on Lyft’s data discovery and metadata engine, Amundsen. With the help of a Neo4j graph database, Amundsen has improved Lyft’s data discovery by reducing time to discover data by 10x.
During this session, Philippe will dive deep into Amundsen’s use cases, impact, and architecture, which effectively combines a comprehensive knowledge graph based upon Neo4j, centralized metadata and other search ranking optimizations to discover data quickly.
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesMongoDB
With so much talk of how Big Data is revolutionizing the world and how a data lake with Hadoop and/or Spark will solve all your data problems, it is hard to tell what is hype, reality, or somewhere in-between.
In working with dozens of enterprises in varying stages of their enterprise data management (EDM) strategy, MongoDB enterprise architect, Matt Kalan, sees the same challenges and misunderstandings arise again and again.
In this session, he will explain common challenges in data management, what capabilities are necessary, and what the future state of architecture looks like. MongoDB is uniquely capable of filling common gaps in the data lake strategy.
This session also includes a live Q&A portion during which you are encouraged to ask questions of our team.
Knowledge graphs - it’s what all businesses now are on the lookout for. But what exactly is a knowledge graph and, more importantly, how do you get one? Do you get it as an out-of-the-box solution or do you have to build it (or have someone else build it for you)? With the help of our knowledge graph technology experts, we have created a step-by-step list of how to build a knowledge graph. It will properly expose and enforce the semantics of the semantic data model via inference, consistency checking and validation and thus offer organizations many more opportunities to transform and interlink data into coherent knowledge.
Graph Data: a New Data Management FrontierDemai Ni
Graph Data: a New Data Management Frontier -- Huawei’s view and Call for Collaboration by Demai Ni:
Huawei provides Enterprise Databases, and are actively exploring the latest technology to provide end-to-end Data Management Solution on Cloud. We are looking at to bridge classic RDMS to Graph Database on a distributed platform.
Enabling Fast Data Strategy: What’s new in Denodo Platform 6.0Denodo
In this presentation, you will see the new functionalities of the Denodo 6.0 detailing dynamic query optimization engine, managing enterprise deployments, and using information self-service for discovery and search.
This presentation is part of the Fast Data Strategy Conference, and you can watch the video here goo.gl/DzRtkg.
Video: https://www.youtube.com/watch?v=Rt2oHibJT4k
Technologies such as Hadoop have addressed the "Volume" problem of Big Data, and technologies such as Spark have recently addressed the "Velocity" problem – but the "Variety" problem is largely unaddressed – there is a lot of manual "data wrangling" to mange data models.
These manual processes do not scale well. Not only is the variety of data increasing, also the rate of change in the data definitions is increasing. We can’t keep up. NoSQL data repositories can handle storage, but we need effective models of the data to fully utilize it.
This talk will present tools and a methodology to manage Big Data Models in a rapidly changing world. This talk covers:
Creating Semantic Metadata Models of Big Data Resources
Graphical UI Tools for Big Data Models
Tools to synchronize Big Data Models and Application Code
Using NoSQL Databases, such as Amazon DynamoDB, with Big Data Models
Using Big Data Models with Hadoop, Storm, Spark, Giraph, and Inference
Using Big Data Models with Machine Learning to generate Predictive Models
Developer Collaborative/Coordination processes using Big Data Models and Git
Managing change – Big Data Models with rapidly changing Data Resources
Big Graph Analytics on Neo4j with Apache SparkKenny Bastani
In this talk I will introduce you to a Docker container that provides you an easy way to do distributed graph processing using Apache Spark GraphX and a Neo4j graph database. You'll learn how to analyze big data graphs that are exported from Neo4j and consequently updated from the results of a Spark GraphX analysis. The types of analysis I will be talking about are PageRank, connected components, triangle counting, and community detection.
Database technologies have evolved to be able to store big data, but are largely inflexible. For complex graph data models stored in a relational database there may be tedious transformations and shuffling around of data to perform large scale analysis.
Fast and scalable analysis of big data has become a critical competitive advantage for companies. There are open source tools like Apache Hadoop and Apache Spark that are providing opportunities for companies to solve these big data problems in a scalable way. Platforms like these have become the foundation of the big data analysis movement.
Speakers
Augmenting Big Data Analytics with NirvanaIgor Sfiligoi
It has been proven that Big Data Analytics can provide major competitive advantage, yet most deployments cannot reach the vast majority of the data owned by an organization!
This presentation introduces the concept of tiered Big Data Analytics , with an eye on the use of Nirvana in this scenario.
"Big Data" is big business, but what does it really mean? How will big data impact industries and consumers? This slide deck goes through some of the high level details of the market and how it is revolutionizing the world.
At Data-centric Architecture Forum 2020 Thomas Cook, our Sales Director of AnzoGraph DB, gave his presentation "Knowledge Graph for Machine Learning and Data Science". These are his slides.
As more businesses realised that data, in all forms and sizes, is critical to making the best possible decisions, we see the continued growth of systems that support massive volume of non-relational or unstructured forms of data. Nothing shows the picture more starkly than the Gartner Magic quadrant for operational database management systems, which assumes that, by 2017, all leading operational DBMSs will offer multiple data models, relational and NoSQL, in a single DBMS platform. Having a single data platform for managing both well-structured data and NoSQL data is beneficial to users; this approach reduces significantly integration, migration, development, maintenance, and operational issues. Therefore, a challenging research work is how to develop efficient consolidated single data management platform covering both relational data and NoSQL to reduce integration issues, simplify operations, and eliminate migration issues.
In this tutorial, we review the previous work on multi-model data management and provide the insights on the research challenges and directions for future work.
Papers and more materials on this tutorial can be found at: http://udbms.cs.helsinki.fi/?tutorials
Gain New Insights by Analyzing Machine Logs using Machine Data Analytics and BigInsights.
Half of Fortune 500 companies experience more than 80 hours of system down time annually. Spread evenly over a year, that amounts to approximately 13 minutes every day. As a consumer, the thought of online bank operations being inaccessible so frequently is disturbing. As a business owner, when systems go down, all processes come to a stop. Work in progress is destroyed and failure to meet SLA’s and contractual obligations can result in expensive fees, adverse publicity, and loss of current and potential future customers. Ultimately the inability to provide a reliable and stable system results in loss of $$$’s. While the failure of these systems is inevitable, the ability to timely predict failures and intercept them before they occur is now a requirement.
A possible solution to the problem can be found is in the huge volumes of diagnostic big data generated at hardware, firmware, middleware, application, storage and management layers indicating failures or errors. Machine analysis and understanding of this data is becoming an important part of debugging, performance analysis, root cause analysis and business analysis. In addition to preventing outages, machine data analysis can also provide insights for fraud detection, customer retention and other important use cases.
AI from your data lake: Using Solr for analyticsDataWorks Summit
Introductory technical session on Apache Solr's (HDP Search) artificial intelligence and machine learning features to discover relationships and insights across big data in the enterprise. Discussions will include how Solr performs graph traversal, anomaly detection, NLP and time-series analysis, and how you can display this data to users with easy-to-create dashboards.
This technical session will review Apache Solr’s streaming expressions, which were introduced in Solr 6.5. With over 100 expressions and evaluators, conditional logic, variables and data structures these functions form the basis of a new paradigm that brings many of the features from the relational world into search. These new capabilities form the basis of a powerful functional programming language that enables the implementation of many parallel computing use cases such as anomaly detection, streaming NLP, graph traversal and time-series analysis.
In order to discover and analyze big data, third party tools such as Jupyter, Tableau, and Lucidworks Insights will be reviewed.
Speaker
Cassandra Targett, Lucidworks, Director of Engineering
Marcelline Saunders, Lucidworks, Director, Global Partner Enablement
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...Databricks
Amundsen is the data discovery metadata platform that originated from Lyft which is recently donated to Linux Foundation AI. Since its open-sourced, Amundsen has been used and extended by many different companies within our community.
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesMongoDB
With so much talk of how Big Data is revolutionizing the world and how a data lake with Hadoop and/or Spark will solve all your data problems, it is hard to tell what is hype, reality, or somewhere in-between.
In working with dozens of enterprises in varying stages of their enterprise data management (EDM) strategy, MongoDB enterprise architect, Matt Kalan, sees the same challenges and misunderstandings arise again and again.
In this session, he will explain common challenges in data management, what capabilities are necessary, and what the future state of architecture looks like. MongoDB is uniquely capable of filling common gaps in the data lake strategy.
This session also includes a live Q&A portion during which you are encouraged to ask questions of our team.
Knowledge graphs - it’s what all businesses now are on the lookout for. But what exactly is a knowledge graph and, more importantly, how do you get one? Do you get it as an out-of-the-box solution or do you have to build it (or have someone else build it for you)? With the help of our knowledge graph technology experts, we have created a step-by-step list of how to build a knowledge graph. It will properly expose and enforce the semantics of the semantic data model via inference, consistency checking and validation and thus offer organizations many more opportunities to transform and interlink data into coherent knowledge.
Graph Data: a New Data Management FrontierDemai Ni
Graph Data: a New Data Management Frontier -- Huawei’s view and Call for Collaboration by Demai Ni:
Huawei provides Enterprise Databases, and are actively exploring the latest technology to provide end-to-end Data Management Solution on Cloud. We are looking at to bridge classic RDMS to Graph Database on a distributed platform.
Enabling Fast Data Strategy: What’s new in Denodo Platform 6.0Denodo
In this presentation, you will see the new functionalities of the Denodo 6.0 detailing dynamic query optimization engine, managing enterprise deployments, and using information self-service for discovery and search.
This presentation is part of the Fast Data Strategy Conference, and you can watch the video here goo.gl/DzRtkg.
Video: https://www.youtube.com/watch?v=Rt2oHibJT4k
Technologies such as Hadoop have addressed the "Volume" problem of Big Data, and technologies such as Spark have recently addressed the "Velocity" problem – but the "Variety" problem is largely unaddressed – there is a lot of manual "data wrangling" to mange data models.
These manual processes do not scale well. Not only is the variety of data increasing, also the rate of change in the data definitions is increasing. We can’t keep up. NoSQL data repositories can handle storage, but we need effective models of the data to fully utilize it.
This talk will present tools and a methodology to manage Big Data Models in a rapidly changing world. This talk covers:
Creating Semantic Metadata Models of Big Data Resources
Graphical UI Tools for Big Data Models
Tools to synchronize Big Data Models and Application Code
Using NoSQL Databases, such as Amazon DynamoDB, with Big Data Models
Using Big Data Models with Hadoop, Storm, Spark, Giraph, and Inference
Using Big Data Models with Machine Learning to generate Predictive Models
Developer Collaborative/Coordination processes using Big Data Models and Git
Managing change – Big Data Models with rapidly changing Data Resources
Big Graph Analytics on Neo4j with Apache SparkKenny Bastani
In this talk I will introduce you to a Docker container that provides you an easy way to do distributed graph processing using Apache Spark GraphX and a Neo4j graph database. You'll learn how to analyze big data graphs that are exported from Neo4j and consequently updated from the results of a Spark GraphX analysis. The types of analysis I will be talking about are PageRank, connected components, triangle counting, and community detection.
Database technologies have evolved to be able to store big data, but are largely inflexible. For complex graph data models stored in a relational database there may be tedious transformations and shuffling around of data to perform large scale analysis.
Fast and scalable analysis of big data has become a critical competitive advantage for companies. There are open source tools like Apache Hadoop and Apache Spark that are providing opportunities for companies to solve these big data problems in a scalable way. Platforms like these have become the foundation of the big data analysis movement.
Speakers
Augmenting Big Data Analytics with NirvanaIgor Sfiligoi
It has been proven that Big Data Analytics can provide major competitive advantage, yet most deployments cannot reach the vast majority of the data owned by an organization!
This presentation introduces the concept of tiered Big Data Analytics , with an eye on the use of Nirvana in this scenario.
"Big Data" is big business, but what does it really mean? How will big data impact industries and consumers? This slide deck goes through some of the high level details of the market and how it is revolutionizing the world.
At Data-centric Architecture Forum 2020 Thomas Cook, our Sales Director of AnzoGraph DB, gave his presentation "Knowledge Graph for Machine Learning and Data Science". These are his slides.
As more businesses realised that data, in all forms and sizes, is critical to making the best possible decisions, we see the continued growth of systems that support massive volume of non-relational or unstructured forms of data. Nothing shows the picture more starkly than the Gartner Magic quadrant for operational database management systems, which assumes that, by 2017, all leading operational DBMSs will offer multiple data models, relational and NoSQL, in a single DBMS platform. Having a single data platform for managing both well-structured data and NoSQL data is beneficial to users; this approach reduces significantly integration, migration, development, maintenance, and operational issues. Therefore, a challenging research work is how to develop efficient consolidated single data management platform covering both relational data and NoSQL to reduce integration issues, simplify operations, and eliminate migration issues.
In this tutorial, we review the previous work on multi-model data management and provide the insights on the research challenges and directions for future work.
Papers and more materials on this tutorial can be found at: http://udbms.cs.helsinki.fi/?tutorials
Gain New Insights by Analyzing Machine Logs using Machine Data Analytics and BigInsights.
Half of Fortune 500 companies experience more than 80 hours of system down time annually. Spread evenly over a year, that amounts to approximately 13 minutes every day. As a consumer, the thought of online bank operations being inaccessible so frequently is disturbing. As a business owner, when systems go down, all processes come to a stop. Work in progress is destroyed and failure to meet SLA’s and contractual obligations can result in expensive fees, adverse publicity, and loss of current and potential future customers. Ultimately the inability to provide a reliable and stable system results in loss of $$$’s. While the failure of these systems is inevitable, the ability to timely predict failures and intercept them before they occur is now a requirement.
A possible solution to the problem can be found is in the huge volumes of diagnostic big data generated at hardware, firmware, middleware, application, storage and management layers indicating failures or errors. Machine analysis and understanding of this data is becoming an important part of debugging, performance analysis, root cause analysis and business analysis. In addition to preventing outages, machine data analysis can also provide insights for fraud detection, customer retention and other important use cases.
AI from your data lake: Using Solr for analyticsDataWorks Summit
Introductory technical session on Apache Solr's (HDP Search) artificial intelligence and machine learning features to discover relationships and insights across big data in the enterprise. Discussions will include how Solr performs graph traversal, anomaly detection, NLP and time-series analysis, and how you can display this data to users with easy-to-create dashboards.
This technical session will review Apache Solr’s streaming expressions, which were introduced in Solr 6.5. With over 100 expressions and evaluators, conditional logic, variables and data structures these functions form the basis of a new paradigm that brings many of the features from the relational world into search. These new capabilities form the basis of a powerful functional programming language that enables the implementation of many parallel computing use cases such as anomaly detection, streaming NLP, graph traversal and time-series analysis.
In order to discover and analyze big data, third party tools such as Jupyter, Tableau, and Lucidworks Insights will be reviewed.
Speaker
Cassandra Targett, Lucidworks, Director of Engineering
Marcelline Saunders, Lucidworks, Director, Global Partner Enablement
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...Databricks
Amundsen is the data discovery metadata platform that originated from Lyft which is recently donated to Linux Foundation AI. Since its open-sourced, Amundsen has been used and extended by many different companies within our community.
Data Discovery at Databricks with AmundsenDatabricks
Databricks used to use a static manually maintained wiki page for internal data exploration. We will discuss how we leverage Amundsen, an open source data discovery tool from Linux Foundation AI & Data, to improve productivity with trust by surfacing the most relevant dataset and SQL analytics dashboard with its important information programmatically at Databricks internally.
We will also talk about how we integrate Amundsen with Databricks world class infrastructure to surface metadata including:
Surface the most popular tables used within Databricks
Support fuzzy search and facet search for dataset- Surface rich metadata on datasets:
Lineage information (downstream table, upstream table, downstream jobs, downstream users)
Dataset owner
Dataset frequent users
Delta extend metadata (e.g change history)
ETL job that generates the dataset
Column stats on numeric type columns
Dashboards that use the given dataset
Use Databricks data tab to show the sample data
Surface metadata on dashboards including: create time, last update time, tables used, etc
Last but not least, we will discuss how we incorporate internal user feedback and provide the same discovery productivity improvements for Databricks customers in the future.
Quick dive into the big data pool without drowning - Demi Ben-Ari @ PanoraysDemi Ben-Ari
Everybody wants to go on the “Big Data” hype cycle, “To do Scale”, to use the coolest tools in the market like Hadoop, Apache Spark, Apache Cassandra, etc.
But do they ask themselves is there really a reason for that?
In the talk we’ll make a brief overview to all of the technologies in the Big Data world nowadays and we’ll talk about the problems that really emerge when you’d like to enter the great world of Big Data handling.
Showing you the Hadoop ecosystem and Apache Spark and all of the distributed tools leading the market today, will give you all a notion of what will be the real costs entering that world.
Promise that I’ll share some stories from the trenches :)
(And about the “pool” thing...I don’t really know how to swim)
Big data analytics: Technology's bleeding edgeBhavya Gulati
There can be data without information , but there can not be information without data.
Companies without Big Data Analytics are deaf and dumb , mere wanderers on web.
Big data architectures and the data lakeJames Serra
With so many new technologies it can get confusing on the best approach to building a big data architecture. The data lake is a great new concept, usually built in Hadoop, but what exactly is it and how does it fit in? In this presentation I'll discuss the four most common patterns in big data production implementations, the top-down vs bottoms-up approach to analytics, and how you can use a data lake and a RDBMS data warehouse together. We will go into detail on the characteristics of a data lake and its benefits, and how you still need to perform the same data governance tasks in a data lake as you do in a data warehouse. Come to this presentation to make sure your data lake does not turn into a data swamp!
Data Con LA 2020
Description
Apache Druid is a cloud-native open-source database that enables developers to build highly-scalable, low-latency, real-time interactive dashboards and apps to explore huge quantities of data. This column-oriented database provides the microsecond query response times required for ad-hoc queries and programmatic analytics. Druid natively streams data from Apache Kafka (and more) and batch loads just about anything. At ingestion, Druid partitions data based on time so time-based queries run significantly faster than traditional databases, plus Druid offers SQL compatibility. Druid is used in production by AirBnB, Nielsen, Netflix and more for real-time and historical data analytics. This talk provides an introduction to Apache Druid including: Druid's core architecture and its advantages, Working with streaming and batch data in Druid, Querying data and building apps on Druid and Real-world examples of Apache Druid in action
Speaker
Matt Sarrel, Imply Data, Developer Evangelist
Is the traditional data warehouse dead?James Serra
With new technologies such as Hive LLAP or Spark SQL, do I still need a data warehouse or can I just put everything in a data lake and report off of that? No! In the presentation I’ll discuss why you still need a relational data warehouse and how to use a data lake and a RDBMS data warehouse to get the best of both worlds. I will go into detail on the characteristics of a data lake and its benefits and why you still need data governance tasks in a data lake. I’ll also discuss using Hadoop as the data lake, data virtualization, and the need for OLAP in a big data solution. And I’ll put it all together by showing common big data architectures.
Bitkom Cray presentation - on HPC affecting big data analytics in FSPhilip Filleul
High value analytics in FS are being enabled by Graph, machine learning and Spark technologies. To make these real at production scale HPC technologies are more appropriate than commodity clusters.
AzureDay - Introduction Big Data Analytics.Łukasz Grala
AzureDay North 2016. Conference about cloud solutions.
What is Analytics? What is Big Data? Why Big Data we have in the cloud. What offer Microsoft for Big Data Analytics. How to start with Big Data Analytics or Advanced Analytics? Session introduce fundamentals for Big Data and Advanced Analytics.
By Data Scientist as a Service
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...Denodo
Watch the full session: Denodo DataFest 2016 sessions: https://goo.gl/Bvmvc9
Data prep and data blending are terms that have come to prominence over the last year or two. On the surface, they appear to offer functionality similar to data virtualization…but there are important differences!
In this session, you will learn:
• How data virtualization complements or contrasts technologies such as data prep and data blending
• Pros and cons of functionality provided by data prep, data catalog and data blending tools
• When and how to use these different technologies to be most effective
This session is part of the Denodo DataFest 2016 event. You can also watch more Denodo DataFest sessions on demand here: https://goo.gl/VXb6M6
Ameya Kanitkar: Using Hadoop and HBase to Personalize Web, Mobile and Email E...WebExpo
“Big Data Tools to Build Personalization”
More at http://webexpo.net/prague2013/talk/using-hadoop-and-hbase-to-personalize-web-mobile-and-email-experience-for-millions-of-users/
This presentation describes what Search Analytics is, what value it brings to the table, how it can be used, what additional functionality and values can be build with search data, etc.
REA Group's journey with Data Cataloging and Amundsenmarkgrover
REA Group's journey with Data Cataloging. Presented at Amundsen community meeting on November 5th, 2020.
Presented by Stacy Sterling, Abhinay Kathuria and Alex Kompos at REA Group.
Amundsen: From discovering to security datamarkgrover
Hear about how Lyft and Square are solving data discovery and data security challenges using a shared open source project - Amundsen.
Talk details and abstract:
https://www.datacouncil.ai/talks/amundsen-from-discovering-data-to-securing-data
TensorFlow Extension (TFX) and Apache Beammarkgrover
Talk on TFX and Beam by Robert Crowe, developer advocate at Google, focussed on TensorFlow.
Learn how the TensorFlow Extended (TFX) project is utilizing Apache Beam to simplify pre- and post-processing for ML pipelines. TFX provides a framework for managing all of necessary pieces of a real-world machine learning project beyond simply training and utilizing models. Robert will provide an overview of TFX, and talk in a little more detail about the pieces of the framework (tf.Transform and tf.ModelAnalysis) which are powered by Apache Beam.
In this Strata 2018 presentation, Ted Malaska and Mark Grover discuss how to make the most of big data at speed.
https://conferences.oreilly.com/strata/strata-ny/public/schedule/detail/72396
Near real-time anomaly detection at Lyftmarkgrover
Near real-time anomaly detection at Lyft, by Mark Grover and Thomas Weise at Strata NY 2018.
https://conferences.oreilly.com/strata/strata-ny/public/schedule/detail/69155
Presentation on dogfooding data at Lyft by Mark Grover and Arup Malakar on Oct 25, 2017 at Big Analytics Meetup (https://www.meetup.com/SF-Big-Analytics/events/243896328/)
Top 5 mistakes when writing Spark applicationsmarkgrover
This is a talk given at Advanced Spark meetup in San Francisco (http://www.meetup.com/Advanced-Apache-Spark-Meetup/events/223668878/). It focusses on common mistakes when writing Spark applications and how to avoid them.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
10. Search based Lineage based Network based
Where is the
table/dashboard for X?
What does it contain?
I am changing a data
model, who are the owner
and most common users?
I want to follow a power
user in my team.
Does this analysis already
exist?
This table’s delivery was
delayed today, I want to
notify everyone
downstream.
I want to bookmark tables of
interest and get a feed of
data delay, schema change,
incidents.
Other requirements
● Leverage as much data automatically as possible
● Preferably, open source and healthy community
● API availability
● Easy to set up
11.
12. Criteria / Products Alation Where
Hows
Airbnb
Data
Portal
Cloudera
Navigator
Apache
Atlas
Search based
Lineage based
Network based
Hive/Presto support
Redshift support
Open source (pref.)
13. First person to discover the South Pole -
Norwegian explorer, Roald Amundsen
14. Power user
● All info in their head
● Get interrupted a lot
due to questions
● Lost
● Ask “power users” a
lot of questions
● Dependencies
landing on time
● Communicating with
stakeholders
Noob user Manager
15.
16.
17.
18.
19.
20.
21.
22.
23.
24. Postgres Hive Redshift ... Presto
Github
Source
File
Databuilder Crawler
Neo4j
Elastic
Search
Metadata Service Search Service
Frontend ServiceML
Feature
Service
Security
Service
Other Microservices
Metadata Sources
25.
26. Pull Model Push Model
● Periodically update the index by pulling from
the system (e.g. database) via crawlers.
● Onus of integration lays on data graph
● No interface to prescribe, hard to maintain
crawlers
● The system (e.g. DB) pushes to a message
bus which downstream subscribes to.
● Onus of integration lies on database
● Message format serves as the interface
● Allows for near-real time indexing
Crawler
Database Data graph
Scheduler
Database Message
queue
Data graph
Preferred if
● Near-real time indexing is important
● Clean interface doesn’t exist
● Other tools like Wherehows are moving
towards Push Model
Preferred if
● Waiting for indexing is ok
● Working with “strapped” teams
● There’s already an interface
30. Relevance Popularity
Tables:
● Descriptions
● Table names, column names
● Tags
Dashboards:
● Description
● Chart names
Tables:
● Querying activity
● Different weights for automated vs adhoc
querying
Dashboards:
● Number of views
● Number of edits
31.
32. “This is God’s
work” - George
X, ex-head of
Analytics, Lyft
“I was on call and
I’m confident 50%
of the questions
could have been
answered by a
simple search in
Amundsen” -
Bomee P, DS, Lyft
42. Discovery Curation
Guidelines on:
- Where to store
data
- How to name
tables, dashes,
columns, etc.
Challenges
- How much do we push
for guidelines vs. just make
it discoverable?
43. Discovery Security
- Provide data &
metadata only on
a per need basis
Challenges
- Do we hide the existence
of a data set?
- Do expose metadata
regardless of whether the
data scientist has access
to the data?