REA Group's journey with Data Cataloging. Presented at Amundsen community meeting on November 5th, 2020.
Presented by Stacy Sterling, Abhinay Kathuria and Alex Kompos at REA Group.
Talk on Data Discovery and Metadata by Mark Grover from July 2019.
Goes into detail of the problem, build/buy/adopt analysis and Lyft's solution - Amundsen, along with thoughts on the future.
Presentation on dogfooding data at Lyft by Mark Grover and Arup Malakar on Oct 25, 2017 at Big Analytics Meetup (https://www.meetup.com/SF-Big-Analytics/events/243896328/)
Spectator to Participant. Contributing to Cassandra (Patrick McFadin, DataSta...DataStax
Feeling the need to contribute something to Apache Cassandra? Maybe you want to help guide the future of your favorite database? Get off the sidelines and get in the game! It's easy to say but how do you even get started? I will outline some of the ways you can help contribute to Apache Cassandra from minor to major. If you don't have the time or ability to submit code, there are a lot of ways you can participate. What if you do want to write some code? I can walk you through the process of creating a patch and submitting for final approval. Got a great idea? I'll show you propose that to the community at large. Take it from me, participating is so much more fun than just watching the project from a distance. Time to jump in!
About the Speaker
Patrick McFadin Chief Evangelist, DataStax
Patrick McFadin is one of the leading experts of Apache Cassandra and data modeling techniques. As the Chief Evangelist for Apache Cassandra and consultant for DataStax, he has helped build some of the largest and exciting deployments in production. Previous to DataStax, he was Chief Architect at Hobsons and an Oracle DBA/Developer for over 15 years.
Talk on Data Discovery and Metadata by Mark Grover from July 2019.
Goes into detail of the problem, build/buy/adopt analysis and Lyft's solution - Amundsen, along with thoughts on the future.
Presentation on dogfooding data at Lyft by Mark Grover and Arup Malakar on Oct 25, 2017 at Big Analytics Meetup (https://www.meetup.com/SF-Big-Analytics/events/243896328/)
Spectator to Participant. Contributing to Cassandra (Patrick McFadin, DataSta...DataStax
Feeling the need to contribute something to Apache Cassandra? Maybe you want to help guide the future of your favorite database? Get off the sidelines and get in the game! It's easy to say but how do you even get started? I will outline some of the ways you can help contribute to Apache Cassandra from minor to major. If you don't have the time or ability to submit code, there are a lot of ways you can participate. What if you do want to write some code? I can walk you through the process of creating a patch and submitting for final approval. Got a great idea? I'll show you propose that to the community at large. Take it from me, participating is so much more fun than just watching the project from a distance. Time to jump in!
About the Speaker
Patrick McFadin Chief Evangelist, DataStax
Patrick McFadin is one of the leading experts of Apache Cassandra and data modeling techniques. As the Chief Evangelist for Apache Cassandra and consultant for DataStax, he has helped build some of the largest and exciting deployments in production. Previous to DataStax, he was Chief Architect at Hobsons and an Oracle DBA/Developer for over 15 years.
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...Databricks
Many had dubbed 2020 as the decade of data. This is indeed an era of data zeitgeist.
From code-centric software development 1.0, we are entering software development 2.0, a data-centric and data-driven approach, where data plays a central theme in our everyday lives.
As the volume and variety of data garnered from myriad data sources continue to grow at an astronomical scale and as cloud computing offers cheap computing and data storage resources at scale, the data platforms have to match in their abilities to process, analyze, and visualize at scale and speed and with ease — this involves data paradigm shifts in processing and storing and in providing programming frameworks to developers to access and work with these data platforms.
In this talk, we will survey some emerging technologies that address the challenges of data at scale, how these tools help data scientists and machine learning developers with their data tasks, why they scale, and how they facilitate the future data scientists to start quickly.
In particular, we will examine in detail two open-source tools MLflow (for machine learning life cycle development) and Delta Lake (for reliable storage for structured and unstructured data).
Other emerging tools such as Koalas help data scientists to do exploratory data analysis at scale in a language and framework they are familiar with as well as emerging data + AI trends in 2021.
You will understand the challenges of machine learning model development at scale, why you need reliable and scalable storage, and what other open source tools are at your disposal to do data science and machine learning at scale.
LinkedIn Infrastructure (analytics@webscale, at fb 2013)Jun Rao
This is the presentation at analytics@webscale in 2013 (http://analyticswebscale.splashthat.com/?em=187&utm_campaign=website&utm_source=sg&utm_medium=em)
Data Security and Protection in DevOps Karen Lopez
Presentation to London #WinOps event Sept 2019. Focusing on data security, privacy, and protection on DevOps efforts. Includes data masking, dev and test, data, Alwasy Encrypted, and more.
This presentation given by Flip Kromer and Huston Hoburg on March 24, 2014 at the MongoDB Meetup in Austin.
Vayacondios is a system we're building at Infochimps to gather metrics on highly complex systems and help humans make sense of their operation. You can think of it as a "data goes in, the right thing happens" machine: send in facts from anywhere about anything, and Vayacondios will promptly process and syndicate them to all consumers. Producers don't have to (or get to) worry about the needs of those who will use the data, or the details of transport, storage, filtering or anything else: the data will go where it needs to go. Each consumer, meanwhile, finds that everything they need to know is available to them, on the fly or on demand, without crufty adapters or extraneous dependencies. They don't have to (or get to) worry about the distribution of their sources, the tempo of update, or how the data came to be.
Vayacondios was built for our technical ops team to monitor all the databases and systems they superintend, but it suggests a better way to build database driven applications of any kind. The quiet tyranny of developing against a traditional database has left us with many bad habits: not duplicating data, using models that serve the query engine not the user, assembling application objects from raw parts on every page refresh. Combining streaming data processing systems with distributed datastores like MongoDB let you do your query on the way _in_ to the database -- any number of queries, decoupled, of any complexity or tempo. The resulting approach is simpler, fault-tolerant, and scales in terms of machines and developers. Most importantly, your data models are purely faithful to the needs of your application, uncontaminated by differing opinions of other consumers or by incidentals of the robots that gather and process and store the data.
Brokering Data: Accelerating Data Evaluation with Databricks White LabelDatabricks
As the data-as-a-service ecosystem continues to evolve, data brokers are faced with an unprecedented challenge – demonstrating the value of their data. Successfully crafting and selling a compelling data product relies on a broker’s ability to differentiate their product from the rest of the market. In smaller or static datasets, measures like row count and cardinality can speak volumes. However, when datasets are in the terabytes or petabytes though – differentiation becomes much difficult. On top of that “data quality” is a somewhat ill-defined term and the definition of a “high quality dataset” can change daily or even hourly.
This breakout session will describe Veraset’s partnership with Databricks, and how we have white labeled Databricks to showcase and accelerate the value of our data. We’ll discuss the challenges that data brokers have faced to date and some of the primitives of our businesses that have guided our direction thus far. We will also actively demo our white label instance and notebook to show how we’ve been able to provide key insights to our customers and reduce the TTFB of data onboarding.
Enterprise Search Summit Keynote: A Big Data Architecture for SearchSearch Technologies
This presentation was given by Search Technologies' CEO Kamran Khan at the November 2013 Enterprise Search Summit / KMWorld in Washington DC. He discussed how modern search engines are currently being combined with powerful independent content processing pipelines and the distributed processing technologies from big data to form new and exciting enterprise search architecture, delivering results only available to the biggest companies with the deepest pockets in the past. For more information visit http://www.searchtechnologies.com/.
Lambda architecture for real time big dataTrieu Nguyen
Lambda Architecture in Real-time Big Data Project
Concepts & Techniques “Thinking with Lambda”
Case study in some real projects
Why lambda architecture is correct solution for big data?
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...Databricks
Amundsen is the data discovery metadata platform that originated from Lyft which is recently donated to Linux Foundation AI. Since its open-sourced, Amundsen has been used and extended by many different companies within our community.
Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli Spark Summit
In the race to invent multi-million dollar business opportunities with exclusive insights, data scientists and engineers are hampered by a multitude of challenges just to make one use case a reality – the need to ingest data from multiple sources, apply real-time analytics, build machine learning algorithms, and intermix different data processing models, all while navigating around their legacy data infrastructure that is just not up to the task. This need has created the demand for Virtual Analytics, where the complexities of disparate data and technology silos have been abstracted away, coupled with a powerful range of analytics and processing horsepower, all in one unified data platform. This talk describes how Databricks is powering this revolutionary new trend with Apache Spark.
This presentation examines some of the top stream analytic platforms in the enterprise. The slide deck explores the characteristics of enterprise stream analytic solutions and discusses the capabilties of some of the top stream analytic platform in the current market.
As technology and needs evolve and the need for scalable and high availability solutions increase there is a need to evaluate new databases. The lack of clarity in the market makes in difficult for IT stakeholders to understand the differences between the solutions available and the choice to make. The key areas to consider while evaluating NoSql databases are data model, query model, consistency model, APIs, support and community strength.
Azure data catalog your data your way eugene polonichko dataconf 21 04 18Olga Zinkevych
Topic of presentation: Azure Data Catalog: your data, your way
The main points of the presentation:It’s a fully-managed service that lets you—from analyst to data scientist to data developer—register, enrich, discover, understand, and consume data sources
http://dataconf.com.ua/speaker-page/eugene-polonichko.php
https://www.youtube.com/watch?v=wceGzcQcPOo&list=PL5_LBM8-5sLjbRFUtXaUpg84gtJtyc4Pu&t=0s&index=4
In this talk, we will discuss the technical and non-technical challenges faced in designing a system used to obfuscate LinkedIn member data stored in Hadoop at scale.
To assist in this task, we built WhereHows (OpenSource) which serves as a data discovery and metadata catalog for all the datasets at LinkedIn. We integrated WhereHows with a compliance monitoring tool which uses Machine Learning to identify datasets with PII (Personal identity Information). We will cover the building blocks of the system and the lessons learned in the process.
Audience takeaways:
- What is takes to obfuscate member data at scale
- The challenges involved in designing the system
- The lessons learned throughout this journey
- What not to assume!
Beyond DevOps: How Netflix Bridges the Gap?C4Media
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1mv6Kpr.
Josh Evans uses the Netflix Operations Engineering as a case study to explore the challenges faced by centralized engineering teams and approaches to addressing those challenges. Filmed at qconsf.com.
Josh Evans is Director of Operations Engineering at Netflix, with experience in e-commerce, playback control services, infrastructure, tools, testing, and operations.
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...Databricks
Many had dubbed 2020 as the decade of data. This is indeed an era of data zeitgeist.
From code-centric software development 1.0, we are entering software development 2.0, a data-centric and data-driven approach, where data plays a central theme in our everyday lives.
As the volume and variety of data garnered from myriad data sources continue to grow at an astronomical scale and as cloud computing offers cheap computing and data storage resources at scale, the data platforms have to match in their abilities to process, analyze, and visualize at scale and speed and with ease — this involves data paradigm shifts in processing and storing and in providing programming frameworks to developers to access and work with these data platforms.
In this talk, we will survey some emerging technologies that address the challenges of data at scale, how these tools help data scientists and machine learning developers with their data tasks, why they scale, and how they facilitate the future data scientists to start quickly.
In particular, we will examine in detail two open-source tools MLflow (for machine learning life cycle development) and Delta Lake (for reliable storage for structured and unstructured data).
Other emerging tools such as Koalas help data scientists to do exploratory data analysis at scale in a language and framework they are familiar with as well as emerging data + AI trends in 2021.
You will understand the challenges of machine learning model development at scale, why you need reliable and scalable storage, and what other open source tools are at your disposal to do data science and machine learning at scale.
LinkedIn Infrastructure (analytics@webscale, at fb 2013)Jun Rao
This is the presentation at analytics@webscale in 2013 (http://analyticswebscale.splashthat.com/?em=187&utm_campaign=website&utm_source=sg&utm_medium=em)
Data Security and Protection in DevOps Karen Lopez
Presentation to London #WinOps event Sept 2019. Focusing on data security, privacy, and protection on DevOps efforts. Includes data masking, dev and test, data, Alwasy Encrypted, and more.
This presentation given by Flip Kromer and Huston Hoburg on March 24, 2014 at the MongoDB Meetup in Austin.
Vayacondios is a system we're building at Infochimps to gather metrics on highly complex systems and help humans make sense of their operation. You can think of it as a "data goes in, the right thing happens" machine: send in facts from anywhere about anything, and Vayacondios will promptly process and syndicate them to all consumers. Producers don't have to (or get to) worry about the needs of those who will use the data, or the details of transport, storage, filtering or anything else: the data will go where it needs to go. Each consumer, meanwhile, finds that everything they need to know is available to them, on the fly or on demand, without crufty adapters or extraneous dependencies. They don't have to (or get to) worry about the distribution of their sources, the tempo of update, or how the data came to be.
Vayacondios was built for our technical ops team to monitor all the databases and systems they superintend, but it suggests a better way to build database driven applications of any kind. The quiet tyranny of developing against a traditional database has left us with many bad habits: not duplicating data, using models that serve the query engine not the user, assembling application objects from raw parts on every page refresh. Combining streaming data processing systems with distributed datastores like MongoDB let you do your query on the way _in_ to the database -- any number of queries, decoupled, of any complexity or tempo. The resulting approach is simpler, fault-tolerant, and scales in terms of machines and developers. Most importantly, your data models are purely faithful to the needs of your application, uncontaminated by differing opinions of other consumers or by incidentals of the robots that gather and process and store the data.
Brokering Data: Accelerating Data Evaluation with Databricks White LabelDatabricks
As the data-as-a-service ecosystem continues to evolve, data brokers are faced with an unprecedented challenge – demonstrating the value of their data. Successfully crafting and selling a compelling data product relies on a broker’s ability to differentiate their product from the rest of the market. In smaller or static datasets, measures like row count and cardinality can speak volumes. However, when datasets are in the terabytes or petabytes though – differentiation becomes much difficult. On top of that “data quality” is a somewhat ill-defined term and the definition of a “high quality dataset” can change daily or even hourly.
This breakout session will describe Veraset’s partnership with Databricks, and how we have white labeled Databricks to showcase and accelerate the value of our data. We’ll discuss the challenges that data brokers have faced to date and some of the primitives of our businesses that have guided our direction thus far. We will also actively demo our white label instance and notebook to show how we’ve been able to provide key insights to our customers and reduce the TTFB of data onboarding.
Enterprise Search Summit Keynote: A Big Data Architecture for SearchSearch Technologies
This presentation was given by Search Technologies' CEO Kamran Khan at the November 2013 Enterprise Search Summit / KMWorld in Washington DC. He discussed how modern search engines are currently being combined with powerful independent content processing pipelines and the distributed processing technologies from big data to form new and exciting enterprise search architecture, delivering results only available to the biggest companies with the deepest pockets in the past. For more information visit http://www.searchtechnologies.com/.
Lambda architecture for real time big dataTrieu Nguyen
Lambda Architecture in Real-time Big Data Project
Concepts & Techniques “Thinking with Lambda”
Case study in some real projects
Why lambda architecture is correct solution for big data?
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...Databricks
Amundsen is the data discovery metadata platform that originated from Lyft which is recently donated to Linux Foundation AI. Since its open-sourced, Amundsen has been used and extended by many different companies within our community.
Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli Spark Summit
In the race to invent multi-million dollar business opportunities with exclusive insights, data scientists and engineers are hampered by a multitude of challenges just to make one use case a reality – the need to ingest data from multiple sources, apply real-time analytics, build machine learning algorithms, and intermix different data processing models, all while navigating around their legacy data infrastructure that is just not up to the task. This need has created the demand for Virtual Analytics, where the complexities of disparate data and technology silos have been abstracted away, coupled with a powerful range of analytics and processing horsepower, all in one unified data platform. This talk describes how Databricks is powering this revolutionary new trend with Apache Spark.
This presentation examines some of the top stream analytic platforms in the enterprise. The slide deck explores the characteristics of enterprise stream analytic solutions and discusses the capabilties of some of the top stream analytic platform in the current market.
As technology and needs evolve and the need for scalable and high availability solutions increase there is a need to evaluate new databases. The lack of clarity in the market makes in difficult for IT stakeholders to understand the differences between the solutions available and the choice to make. The key areas to consider while evaluating NoSql databases are data model, query model, consistency model, APIs, support and community strength.
Azure data catalog your data your way eugene polonichko dataconf 21 04 18Olga Zinkevych
Topic of presentation: Azure Data Catalog: your data, your way
The main points of the presentation:It’s a fully-managed service that lets you—from analyst to data scientist to data developer—register, enrich, discover, understand, and consume data sources
http://dataconf.com.ua/speaker-page/eugene-polonichko.php
https://www.youtube.com/watch?v=wceGzcQcPOo&list=PL5_LBM8-5sLjbRFUtXaUpg84gtJtyc4Pu&t=0s&index=4
In this talk, we will discuss the technical and non-technical challenges faced in designing a system used to obfuscate LinkedIn member data stored in Hadoop at scale.
To assist in this task, we built WhereHows (OpenSource) which serves as a data discovery and metadata catalog for all the datasets at LinkedIn. We integrated WhereHows with a compliance monitoring tool which uses Machine Learning to identify datasets with PII (Personal identity Information). We will cover the building blocks of the system and the lessons learned in the process.
Audience takeaways:
- What is takes to obfuscate member data at scale
- The challenges involved in designing the system
- The lessons learned throughout this journey
- What not to assume!
Beyond DevOps: How Netflix Bridges the Gap?C4Media
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1mv6Kpr.
Josh Evans uses the Netflix Operations Engineering as a case study to explore the challenges faced by centralized engineering teams and approaches to addressing those challenges. Filmed at qconsf.com.
Josh Evans is Director of Operations Engineering at Netflix, with experience in e-commerce, playback control services, infrastructure, tools, testing, and operations.
How do effective large-scale service ecosystems work? Keynote Presentation at Istanbul Tech Talks 2018
How to Design Services
* Systems of record
* Interface specification
* Interface backward / forward compatibility
Service Ecosystems
* Layered services
* "Standardization" through encouragement
* Vendor-customer relationships between teams
Operating and Deploying Services
* Data Migration
* Automated Pipelines
* Incremental Deployment
* Feature Flags
This presentation introduces Kicktag and the Cosmos reporting platform - this is the perfect place to start if you haven't worked with us before, and there are a number of references for further reading.
In this deck, we present an outline of the Cosmos platform including how the reporting modules and data integration tools work together. There are a number of visual examples ranging from basic document libraries to real-time analytics dashboards and bespoke mobile business discovery portals.
Marjorie M. K. Hlava, President, Chair of the Board, and Chief Scientist, Access Innovations, Inc.
During this annual highlight of the DHUG meetings, Margie will discuss the exciting new changes and additions to the Data Harmony software. She will be joined by some members of our software development team to talk about specific initiatives we have worked on over the past year.
Agile Content Development and the IXIASOFT DITA CMSIXIASOFT
Keith Schengili-Roberts, IXIASOFT DITA Information Architect, reviews the benefits of working with agile content development and the IXIASOFT DITA CMS.
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
Data Mesh is an innovative concept addressing many data challenges from an architectural, cultural, and organizational perspective. But is the world ready to implement Data Mesh?
In this session, we will review the importance of core Data Mesh principles, what they can offer, and when it is a good idea to try a Data Mesh architecture. We will discuss common challenges with implementation of Data Mesh systems and focus on the role of open-source projects for it. Projects like Apache Spark can play a key part in standardized infrastructure platform implementation of Data Mesh. We will examine the landscape of useful data engineering open-source projects to utilize in several areas of a Data Mesh system in practice, along with an architectural example. We will touch on what work (culture, tools, mindset) needs to be done to ensure Data Mesh is more accessible for engineers in the industry.
The audience will leave with a good understanding of the benefits of Data Mesh architecture, common challenges, and the role of Apache Spark and other open-source projects for its implementation in real systems.
This session is targeted for architects, decision-makers, data-engineers, and system designers.
Fishbowl Solutions' Administration Suite combines our most effective and popular tools for WebCenter administrators. Learn more about how these tools can automate many daily tasks and simplify processes!
A presentation by Mike Jennings and Roger Howard for the Createasphere DAM conference 2011 in Burbank, CA.
The presentation discusses issues in metadata interoperability and tools to improve it -- mostly open-source or free tools.
Play Architecture, Implementation, Shiny Objects, and a ProposalMike Slinn
ScalaCourses.com has been serving online Scala and Play training material to students for over two years. ScalaCourses.com teaches courses on the same technology stack that the web site runs on. The Cadenza application that powers ScalaCourses.com is a Play Framework 2 application, written in Scala and using Akka, Slick, AWS and Postgres. Some of the architectural features in Cadenza that allow a modest-sized Play application to serve large amounts of multimedia data efficiently is discussed, including technical details of how to work with an immutable domain model that can be modified.
Over the last 2+ years the underlying technology has changed a lot; a brief history of Play Framework will be recounted, and how that impacted Cadenza. The talk concludes with a proposal regarding Play Framework's future.
A design system can vastly improve your team's productivity, but most of all, it leads to better products! The challenge lies in creating a mature system and leading its adoption across the company successfully. Let's talk about how we learned to meet the needs of different designers and developers on different products, on different tech stacks, on different platforms. Attendees will go home with tips they can use to improve design systems of any stage.
Similar to REA Group's journey with Data Cataloging and Amundsen (20)
Amundsen: From discovering to security datamarkgrover
Hear about how Lyft and Square are solving data discovery and data security challenges using a shared open source project - Amundsen.
Talk details and abstract:
https://www.datacouncil.ai/talks/amundsen-from-discovering-data-to-securing-data
Amundsen: From discovering to security datamarkgrover
Hear about how Lyft and Square are solving data discovery and data security challenges using a shared open source project - Amundsen.
Talk details and abstract:
https://www.datacouncil.ai/talks/amundsen-from-discovering-data-to-securing-data
TensorFlow Extension (TFX) and Apache Beammarkgrover
Talk on TFX and Beam by Robert Crowe, developer advocate at Google, focussed on TensorFlow.
Learn how the TensorFlow Extended (TFX) project is utilizing Apache Beam to simplify pre- and post-processing for ML pipelines. TFX provides a framework for managing all of necessary pieces of a real-world machine learning project beyond simply training and utilizing models. Robert will provide an overview of TFX, and talk in a little more detail about the pieces of the framework (tf.Transform and tf.ModelAnalysis) which are powered by Apache Beam.
In this Strata 2018 presentation, Ted Malaska and Mark Grover discuss how to make the most of big data at speed.
https://conferences.oreilly.com/strata/strata-ny/public/schedule/detail/72396
Near real-time anomaly detection at Lyftmarkgrover
Near real-time anomaly detection at Lyft, by Mark Grover and Thomas Weise at Strata NY 2018.
https://conferences.oreilly.com/strata/strata-ny/public/schedule/detail/69155
Top 5 mistakes when writing Spark applicationsmarkgrover
This is a talk given at Advanced Spark meetup in San Francisco (http://www.meetup.com/Advanced-Apache-Spark-Meetup/events/223668878/). It focusses on common mistakes when writing Spark applications and how to avoid them.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
2. How do you pronounce Amundsen?
• American way != Australian way != Norwegian way
3. Agenda
• Why we needed a data catalog and why we chose Amundsen
• An overview of our implementation
• User feedback and customisations
• What's next on our roadmap
Alex Kompos
Data Developer
Abhinay Kathuria
Data Developer
Stacy Sterling
Data Manager
4. Why we needed a data catalog
• REA Group is Australia's largest property advertising portal
• 1,400 employees
• ~500 developers
• ~50 analysts & data scientists
7. Why we chose Amundsen
Pros
• Most of our "must have" features were already available (integration with BigQuery and Airflow)
• Flexiblity to customise and build features we needed
• Doesn't rely on manual curation which can become outdated quickly
• Allows users to search for data they don't already have access to
• Clean, intuitive UI
• Opportunity for our team to contribute back to an open-source project
Considerations
• Lacked features that the vendor solutions offered (business metrics glossary, column-level
lineage)
• Our team did not have much front-end development experience
• We didn't know how long implementation might take
8. How did we implement
• Implemented a POC last year as
a Hackathon Project
• Wanted to Productionize an MVP
• Get alpha user feedback
• Release to the wider community
9. Deployment Stack
• AWS ECS for each service
• Neo4j Backend running on EC2
• AWS Managed Elasticsearch
• EFS Storage for Neo4j
10. Metadata Extraction
• Using Breeze (Internal ETL as a service tool)
• Running a DAG daily
• Scrape data from Google BigQuery
11. What customisations did we make?
• Amundsen is built to be company agnostic
• Each company has a different data culture, data maturity level and
domains.
• Over 12 changes to Amundsen
• Based on feedback from alpha users
• Changes that relate to a broader audience will up streamed
12. How did we implement the changes?
• Customisation are done by building a custom docker image
• Any changes to source files are then patched when building the image
• We mirror the folder structure on mainline
• Patching is ”cheap”
• Will be annoying to deal with version upgrades with large refactors
• Forking might be easier in the future
14. Separating service accounts & frequent users
• Our users look to Frequent Users to find domain experts however it was
pollulted by our service which don’t provide much context
• E.g vaultxxxxx-xxxxxx--xxxxxx@xxx-xxx-xxxx.iam.gserviceaccount.com
• This was achieve by filtering out users with “gserviceaccount”
• Unsure if this feature would be useful to the broader community
15. Advance search
Amundsen 2.3.0REA version
• Tool tips that resonated with our users
• Used “BigQuery” Language
• Remove non applicable filters
• Done through the frontend config
16. Partition Columns
Amundsen 2.3.0REA version
• Confusion with partition ranges
came up.
• Used “BigQuery” Language
• Defaults to “Non-Partitioned Table”
17. What's next on the menu Amundsen at REA?
Coming up next
• Authentication & authorization (RBAC)
• Preview feature, bookmarks
• Surface Breeze metadata
• Breeze is our ETL as Airflow-based ETL for job orchestration YAML-based abstraction layer
• Data Linage umbrella
• Input/Output tables, transformation logic, schedules
• Ties into our broader Meta Data Strategy
• Meta data stored in either BigQuery table or Kafka
18. Also in our backlog (not high priority)
• Enforcing table & field descriptions through Breeze
• Adding programmatic descriptions
• Improving the way search results are displayed
• Table-level lineage
• Implementing a tagging strategy
• Integration with a business metrics glossary
• Integration with Tableau Server
• Integration with Kafka topics