Presentation on dogfooding data at Lyft by Mark Grover and Arup Malakar on Oct 25, 2017 at Big Analytics Meetup (https://www.meetup.com/SF-Big-Analytics/events/243896328/)
Talk on Data Discovery and Metadata by Mark Grover from July 2019.
Goes into detail of the problem, build/buy/adopt analysis and Lyft's solution - Amundsen, along with thoughts on the future.
REA Group's journey with Data Cataloging and Amundsenmarkgrover
REA Group's journey with Data Cataloging. Presented at Amundsen community meeting on November 5th, 2020.
Presented by Stacy Sterling, Abhinay Kathuria and Alex Kompos at REA Group.
Talk on Data Discovery and Metadata by Mark Grover from July 2019.
Goes into detail of the problem, build/buy/adopt analysis and Lyft's solution - Amundsen, along with thoughts on the future.
REA Group's journey with Data Cataloging and Amundsenmarkgrover
REA Group's journey with Data Cataloging. Presented at Amundsen community meeting on November 5th, 2020.
Presented by Stacy Sterling, Abhinay Kathuria and Alex Kompos at REA Group.
MongoDB Days Germany: Data Processing with MongoDBMongoDB
Presented by Marc Schwering, Senior Solutions Architect, MongoDB
Modern architectures are moving away from "one size fits all" solutions. The best tools need to be put to the job and given the large amounts of options today, chances are that you’ll end up using MongoDB for your operational workload, as well as Data Processing Systems like Apache Flink or Spark for your high speed data processing needs. When documents or data structures are modeled, there are some key aspects that need to be attended. This takes into consideration the distribution of data nodes, streaming capabilities, performance, aggregation, and queryability options, and how we can integrate the different data processing software that can benefit from subtle but substantial model changes. This session will cover the way how you enhance your architecture using data processing technologies such as Apache Flink and Spark. It will take the audience through the evolution of an app from simple to complex with its architectural requirements . We´ll look into similarities and differences of available technologies and you will walk away with an understanding how to use MongoDB to fulfill more advanced tasks such as personalization through clustering algorithms.
Scalable Monitoring Using Apache Spark and Friends with Utkarsh BhatnagarDatabricks
This session will give a new dimension to Apache Spark’s usage. See how Apache Spark and other open source projects can be used together in providing a scalable, real-time monitoring system. Apache Spark plays the central role in providing this scalable solution, since without Spark Streaming we would not be able to process millions of events in real time. This approach can provide a lot of learning to the DevOps/Infrastructure domain on how to build a scalable and automated logging and monitoring solution using Apache Spark, Apache Kafka, Grafana and some other open-source technologies.
Sony PlayStation’s monitoring pipeline processes about 40 billion events every day, and generates metrics in near real-time (within 30 seconds). All the components, used along with Apache Spark, are horizontally scalable using any auto-scaling techniques, which enhances the reliability of this efficient and highly available monitoring solution. Sony Interactive Entertainment has been using Apache Spark, and specifically Spark Streaming, for the last three years. Hear about some important lessons they have learned. For example, they still use Spark Streaming’s receiver-based method in certain use cases instead of Direct Streaming, and will share the application of both the methods, giving the knowledge back to the community.
Overview of Apache Flink: Next-Gen Big Data Analytics FrameworkSlim Baltagi
These are the slides of my talk on June 30, 2015 at the first event of the Chicago Apache Flink meetup. Although most of the current buzz is about Apache Spark, the talk shows how Apache Flink offers the only hybrid open source (Real-Time Streaming + Batch) distributed data processing engine supporting many use cases: Real-Time stream processing, machine learning at scale, graph analytics and batch processing.
In these slides, you will find answers to the following questions: What is Apache Flink stack and how it fits into the Big Data ecosystem? How Apache Flink integrates with Apache Hadoop and other open source tools for data input and output as well as deployment? What is the architecture of Apache Flink? What are the different execution modes of Apache Flink? Why Apache Flink is an alternative to Apache Hadoop MapReduce, Apache Storm and Apache Spark? Who is using Apache Flink? Where to learn more about Apache Flink?
In this webinar, Adam will explain the benefits and restrictions that are encountered when working with Big Data systems in a modern agile development approach. He will go on to present some of the approaches, both in automation and in their management of testing activities, that his team has successfully adopted in tackling the big data testing challenge.
Tell me more - http://testhuddle.com/resource/big-data-a-new-testing-challenge/
Jumbune helps developers to analyze the data flow within their code and perform distributed debugging of MapReduce application, detect data anomalies, monitor your cluster and also profile your code.
MongoDB Days Germany: Data Processing with MongoDBMongoDB
Presented by Marc Schwering, Senior Solutions Architect, MongoDB
Modern architectures are moving away from "one size fits all" solutions. The best tools need to be put to the job and given the large amounts of options today, chances are that you’ll end up using MongoDB for your operational workload, as well as Data Processing Systems like Apache Flink or Spark for your high speed data processing needs. When documents or data structures are modeled, there are some key aspects that need to be attended. This takes into consideration the distribution of data nodes, streaming capabilities, performance, aggregation, and queryability options, and how we can integrate the different data processing software that can benefit from subtle but substantial model changes. This session will cover the way how you enhance your architecture using data processing technologies such as Apache Flink and Spark. It will take the audience through the evolution of an app from simple to complex with its architectural requirements . We´ll look into similarities and differences of available technologies and you will walk away with an understanding how to use MongoDB to fulfill more advanced tasks such as personalization through clustering algorithms.
Scalable Monitoring Using Apache Spark and Friends with Utkarsh BhatnagarDatabricks
This session will give a new dimension to Apache Spark’s usage. See how Apache Spark and other open source projects can be used together in providing a scalable, real-time monitoring system. Apache Spark plays the central role in providing this scalable solution, since without Spark Streaming we would not be able to process millions of events in real time. This approach can provide a lot of learning to the DevOps/Infrastructure domain on how to build a scalable and automated logging and monitoring solution using Apache Spark, Apache Kafka, Grafana and some other open-source technologies.
Sony PlayStation’s monitoring pipeline processes about 40 billion events every day, and generates metrics in near real-time (within 30 seconds). All the components, used along with Apache Spark, are horizontally scalable using any auto-scaling techniques, which enhances the reliability of this efficient and highly available monitoring solution. Sony Interactive Entertainment has been using Apache Spark, and specifically Spark Streaming, for the last three years. Hear about some important lessons they have learned. For example, they still use Spark Streaming’s receiver-based method in certain use cases instead of Direct Streaming, and will share the application of both the methods, giving the knowledge back to the community.
Overview of Apache Flink: Next-Gen Big Data Analytics FrameworkSlim Baltagi
These are the slides of my talk on June 30, 2015 at the first event of the Chicago Apache Flink meetup. Although most of the current buzz is about Apache Spark, the talk shows how Apache Flink offers the only hybrid open source (Real-Time Streaming + Batch) distributed data processing engine supporting many use cases: Real-Time stream processing, machine learning at scale, graph analytics and batch processing.
In these slides, you will find answers to the following questions: What is Apache Flink stack and how it fits into the Big Data ecosystem? How Apache Flink integrates with Apache Hadoop and other open source tools for data input and output as well as deployment? What is the architecture of Apache Flink? What are the different execution modes of Apache Flink? Why Apache Flink is an alternative to Apache Hadoop MapReduce, Apache Storm and Apache Spark? Who is using Apache Flink? Where to learn more about Apache Flink?
In this webinar, Adam will explain the benefits and restrictions that are encountered when working with Big Data systems in a modern agile development approach. He will go on to present some of the approaches, both in automation and in their management of testing activities, that his team has successfully adopted in tackling the big data testing challenge.
Tell me more - http://testhuddle.com/resource/big-data-a-new-testing-challenge/
Jumbune helps developers to analyze the data flow within their code and perform distributed debugging of MapReduce application, detect data anomalies, monitor your cluster and also profile your code.
Automating Postgres Deployments on AWS and VMware, with Terraform and AnsibleEDB
Deploying Postgres on AWS and VMware shouldn’t be a clunky, manual task. In 2020 and beyond, automation and Postgres will go hand in hand.
Highly available, fault-tolerant PostgreSQL database cluster creation and monitoring can now be deployed in a matter of minutes using Terraform and Ansible scripts. In this webinar, you will learn how developers, devOps, and DBAs can automate cloud and on-premise Postgres deployment with the help of Terraform and Ansible.
We will cover:
- Popular deployment options for Postgres
- Postgres platform database and toolsets
- Automating deployment of Postgres with high availability
- Automating deployment of fault-tolerant Postgres cluster
- Monitoring and managing Postgres at scale
- Demo for Terraform Postgres deployment scripts
The latest distributed system utilizing the cloud is a very complicated configuration in which the components span a plurality of components. Applications for customers are part of products, and service quality targets directly linked to business indicators are needed. Legacy monitoring system based on traditional system management is not linked not only to business indicators but also to measure service quality. Google advocates the idea of site reliability engineering (SRE) and introduces efforts to measure quality of service. Based on the concept of SRE, the service quality monitoring system collects and analyzes logs from various components not only application codes but also whole infrastructure components. Since very large amounts of data must be processed in real time, it is necessary to design carefully with reference to the big data architecture. To utilize this system, you can measure the quality of service, and make it possible to continuously improve the service quality.
Monitoring is a critical element of the database ecosystem and for overall performance of the database. It also helps ensure that your database is in a healthy state and contributing to the long term stability of your database and application.
Monitoring becomes tricky without knowing what to monitor and how to monitor those variables.
What you will learn in this webinar:
1. Why is it essential to monitor Postgres?
2. The proactive and reactive monitoring approaches and how they help handle future problems
3. The top 10 monitoring checklist
4. When not to monitor
5. A deeper dive into open source tools for monitoring with Prometheus and Grafana
Monitorama 2015 talk by Brendan Gregg, Netflix. With our large and ever-changing cloud environment, it can be vital to debug instance-level performance quickly. There are many instance monitoring solutions, but few come close to meeting our requirements, so we've been building our own and open sourcing them. In this talk, I will discuss our real-world requirements for instance-level analysis and monitoring: not just the metrics and features we desire, but the methodologies we'd like to apply. I will also cover the new and novel solutions we have been developing ourselves to meet these needs and desires, which include use of advanced Linux performance technologies (eg, ftrace, perf_events), and on-demand self-service analysis (Vector).
Automating the process of continuously prioritising data, updating and deploy...Ola Spjuth
Presentation at Data Innovation Summit 2019 in Stockholm, Sweden.
ABSTRACT
Microscopes are capable of producing vast amounts of data, and when used in automated laboratories both the number and size of images present many challenges for storing, categorizing, analyzing, annotating, and transforming the data into actionable information that can used for decision making; either by humans or machines. In this presentation I will describe the informatics system we have established at the Department of Pharmaceutical Biosciences at Uppsala University, which consists of computational hardware (CPUs, GPUs, storage), middleware (Kubernetes), imaging database (OMERO), and workflow system (Pachyderm) to perform online prioritization of new data, as well as the continuous analytics system to automate the process from captured images to continuously updated and deployed AI models. The AI methodologies include Deep Learning models trained on image data, and conventional machine learning models trained on features extracted from images or chemical structures. Due to the microservice architecture the system is scalable and can be expanded using hybrid-architectures with cloud computing resources. The informatics system serves a robotized cell profiling setup with incubators, liquid handling and high-content microscopy. The lab is quite young and is targeting applications primarily in drug screening and toxicity assessment, with the aim to improve research using AI and intelligent design of experiments.
Working with big volumes of data is a complicated task, but it's even harder if you have to do everything in real time and try to figure it all out yourself. This session will use practical examples to discuss architectural best practices and lessons learned when solving real-time social media analytics, sentiment analysis, and data visualization decision-making problems with AWS. Learn how you can leverage AWS services like Amazon RDS, AWS CloudFormation, Auto Scaling, Amazon S3, Amazon Glacier, and Amazon Elastic MapReduce to perform highly performant, reliable, real-time big data analytics while saving time, effort, and money. Gain insight from two years of real-time analytics successes and failures so you don't have to go down this path on your own.
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...Databricks
Morningstar’s Risk Model project is created by stitching together statistical and machine learning models to produce risk and performance metrics for millions of financial securities. Previously, we were running a single version of this application, but needed to expand it to allow for customizations based on client demand. With the goal of running hundreds of custom Risk Model runs at once at an output size of around 1TB of data each, we had a challenging technical problem on our hands! In this presentation, we’ll talk about the challenges we faced replatforming this application to Spark, how we solved them, and the benefits we saw.
Some things we’ll touch on include how we created customized models, the architecture of our machine learning application, how we maintain an audit trail of data transformations (for rigorous third party audits), and how we validate the input data our model takes in and output data our model produces. We want the attendees to walk away with some key ideas of what worked for us when productizing a large scale machine learning platform.
5 Key Metrics to Release Better Software FasterDynatrace
No matter how often you deploy your application, or how sophisticated your delivery pipeline is, you always need to know the quality status of the software you are building. This can only be done if you measure it. But measure what exactly?
Andreas Grabner and Brett Hofer, app performance evangelists, explain five key metrics to increase your confidence in securing a safe build for production.
• Learn why metrics can be huge quality gateways
• Identify key metrics to take back to your team (Dev, Test, Ops and Business)
• Understand how to use, measure and report these metrics
• 3 short use cases and how using metrics can help you avoid them
Metrics Are Not Enough: Monitoring Apache Kafka and Streaming Applicationsconfluent
When you are running systems in production, clearly you want to make sure they are up and running at all times. But in a distributed system such as Apache Kafka… what does “up and running” even mean?
Experienced Apache Kafka users know what is important to monitor, which alerts are critical and how to respond to them. They don’t just collect metrics - they go the extra mile and use additional tools to validate availability and performance on both the Kafka cluster and their entire data pipelines.
In this presentation we’ll discuss best practices of monitoring Apache Kafka. We’ll look at which metrics are critical to alert on, which are useful in troubleshooting and what may actually be misleading. We’ll review a few “worst practices” - common mistakes that you should avoid. We’ll then look at what metrics don’t tell you - and how to cover those essential gaps.
Amundsen: From discovering to security datamarkgrover
Hear about how Lyft and Square are solving data discovery and data security challenges using a shared open source project - Amundsen.
Talk details and abstract:
https://www.datacouncil.ai/talks/amundsen-from-discovering-data-to-securing-data
Amundsen: From discovering to security datamarkgrover
Hear about how Lyft and Square are solving data discovery and data security challenges using a shared open source project - Amundsen.
Talk details and abstract:
https://www.datacouncil.ai/talks/amundsen-from-discovering-data-to-securing-data
TensorFlow Extension (TFX) and Apache Beammarkgrover
Talk on TFX and Beam by Robert Crowe, developer advocate at Google, focussed on TensorFlow.
Learn how the TensorFlow Extended (TFX) project is utilizing Apache Beam to simplify pre- and post-processing for ML pipelines. TFX provides a framework for managing all of necessary pieces of a real-world machine learning project beyond simply training and utilizing models. Robert will provide an overview of TFX, and talk in a little more detail about the pieces of the framework (tf.Transform and tf.ModelAnalysis) which are powered by Apache Beam.
In this Strata 2018 presentation, Ted Malaska and Mark Grover discuss how to make the most of big data at speed.
https://conferences.oreilly.com/strata/strata-ny/public/schedule/detail/72396
Near real-time anomaly detection at Lyftmarkgrover
Near real-time anomaly detection at Lyft, by Mark Grover and Thomas Weise at Strata NY 2018.
https://conferences.oreilly.com/strata/strata-ny/public/schedule/detail/69155
Top 5 mistakes when writing Spark applicationsmarkgrover
This is a talk given at Advanced Spark meetup in San Francisco (http://www.meetup.com/Advanced-Apache-Spark-Meetup/events/223668878/). It focusses on common mistakes when writing Spark applications and how to avoid them.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
10. 1. Auditing
• What - Query text
• Who - User
• Where - Cluster
• How - Presto/Hive
• When - Timestamp
• Stats (Optional)
‒ CPU seconds
‒ MB seconds
10
19. 5. Experimentation
• Enrich benchto benchmark
• Golden set of queries
• Benchmark
‒ New version
‒ Config change
• Run queries to evaluate new systems
19
20. Bonus! Dogfooding data platform
• Renaming column name disallowed
‒ Data council
‒ More reviewers
• Testing/debugging event ingestion
‒ Unit testing
‒ Developer SDKs
20