Tips, tricks, and gotchas learned at Localytics for optimizing MongoDB installs. Includes information about document design, indexes, fragmentation, migration, AWS EC2/EBS, and more.
This presentation shortly describes key features of Apache Cassandra. It was held at the Apache Cassandra Meetup in Vienna in January 2014. You can access the meetup here: http://www.meetup.com/Vienna-Cassandra-Users/
Introduction to memcached, a caching service designed for optimizing performance and scaling in the web stack, seen from perspective of MySQL/PHP users. Given for 2nd year students of professional bachelor in ICT at Kaho St. Lieven, Gent.
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013mumrah
Apache Kafka is a new breed of messaging system built for the "big data" world. Coming out of LinkedIn (and donated to Apache), it is a distributed pub/sub system built in Scala. It has been an Apache TLP now for several months with the first Apache release imminent. Built for speed, scalability, and robustness, Kafka should definitely be one of the data tools you consider when designing distributed data-oriented applications.
The talk will cover a general overview of the project and technology, with some use cases, and a demo.
This presentation shortly describes key features of Apache Cassandra. It was held at the Apache Cassandra Meetup in Vienna in January 2014. You can access the meetup here: http://www.meetup.com/Vienna-Cassandra-Users/
Introduction to memcached, a caching service designed for optimizing performance and scaling in the web stack, seen from perspective of MySQL/PHP users. Given for 2nd year students of professional bachelor in ICT at Kaho St. Lieven, Gent.
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013mumrah
Apache Kafka is a new breed of messaging system built for the "big data" world. Coming out of LinkedIn (and donated to Apache), it is a distributed pub/sub system built in Scala. It has been an Apache TLP now for several months with the first Apache release imminent. Built for speed, scalability, and robustness, Kafka should definitely be one of the data tools you consider when designing distributed data-oriented applications.
The talk will cover a general overview of the project and technology, with some use cases, and a demo.
NoSQL and NewSQL: Tradeoffs between Scalable Performance & ConsistencyScyllaDB
This webinar compares NoSQL and NewSQL databases. We will look at the significant architectural differences between the two, tradeoffs between availability, scalable performance and consistency, data models, and share benchmark results to display the performance implications of NoSQL versus NewSQL.
When does InnoDB lock a row? Multiple rows? Why would it lock a gap? How do transactions affect these scenarios? Locking is one of the more opaque features of MySQL, but it’s very important for both developers and DBA’s to understand if they want their applications to work with high performance and concurrency. This is a creative presentation to illustrate the scenarios for locking in InnoDB and make these scenarios easier to visualize. I'll cover: key locks, table locks, gap locks, shared locks, exclusive locks, intention locks, insert locks, auto-inc locks, and also conditions for deadlocks.
MySQL and MariaDB though they share the same roots for replication .They support parallel replication , but they diverge the way the parallel replication is implemented.
Faster, better, stronger: The new InnoDBMariaDB plc
For MariaDB Enterprise Server 10.5, the default transactional storage engine, InnoDB, has been significantly rewritten to improve the performance of writes and backups. Next, we removed a number of parameters to reduce unnecessary complexity, not only in terms of configuration but of the code itself. And finally, we improved crash recovery thanks to better consistency checks and we reduced memory consumption and file I/O thanks to an all new log record format.
In this session, we’ll walk through all of the improvements to InnoDB, and dive deep into the implementation to explain how these improvements help everything from configuration and performance to reliability and recovery.
Tl;dr; How to make Apache Spark process data efficiently? Lessons learned from running petabyte scale Hadoop cluster and dozens of spark jobs’ optimisations including the most spectacular: from 2500 gigs of RAM to 240.
Apache Spark is extremely popular for processing data on Hadoop clusters. If Your Spark executors go down, an amount of memory is increased. If processing goes too slow, number of executors is increased. Well, this works for some time but sooner or later You end up with a whole cluster fully utilized in an inefficient way.
During the presentation, we will present our lessons learned and performance improvements on Spark jobs including the most spectacular: from 2500 gigs of RAM to 240. We will also answer the questions like:
- How does pySpark job differ from Scala jobs in terms of performance?
- How does caching affect dynamic resource allocation
- Why is it worth to use mapPartitions?
and many more.
DNS is critical network infrastructure and securing it against attacks like DDoS, NXDOMAIN, hijacking and Malware/APT is very important to protecting any business.
Amazon RDS with Amazon Aurora | AWS Public Sector Summit 2016Amazon Web Services
This session provides the attendee with an overview of Amazon RDS across different database types and then dives deep into the benefits and performance of Amazon Aurora.
Under the Hood of a Shard-per-Core Database ArchitectureScyllaDB
Most databases are based on architectures that pre-date advances to modern hardware. This results in performance issues, the need to overprovision, and a high total cost of ownership. In this webinar we will discuss the advances to modern server technology and take a deep dive into Scylla’s shard-per-core architecture and our asynchronous engine, the Seastar framework.
Join us to learn how Seastar (and Scylla):
Avoid locks and contention on the CPU level
Bypass kernel bottlenecks
Implement its per-core shared-nothing autosharding mechanism
Utilize modern storage hardware
Leverage NUMA to get the best RAM performance
Balance your data across CPUs and nodes for best and smoothest performance
Plus we’ll cover the advantages of unlocking vertical scalability.
Slidedeck presented at http://devternity.com/ around MongoDB internals. We review the usage patterns of MongoDB, the different storage engines and persistency models as well has the definition of documents and general data structures.
Choosing the right Professional Employer Organization will help your business remain in compliance, leverage the efficiencies of great technology, and facilitate access to comprehensive capabilities that will benefit your business and its employees. So what are the most basic and key things you need to know about PEO?
The Top Six Early Detection and Action Must-Haves for Improving OutcomesHealth Catalyst
Given the industry’s shift toward value-based, outcomes-based healthcare, organizations are working to improve outcomes. One of their top outcomes improvement priorities should be early detection and action, which can significantly improve clinical, financial, and patient experience outcomes. Through early detection and action, systems embrace a proactive approach to healthcare that aims to prevent illness; the earlier a condition is detected, the better the outcome.
But, as with most things in healthcare, improving early detection is easier said than done. This executive report provides helpful, actionable guidance about overcoming common barriers (logistical, cultural, and technical) and improving early detection and action by integrating six must-haves:
Multidisciplinary teams
Analytics
Leadership-driven culture change
Creative customization
Proof-of-concept pilot projects
NoSQL and NewSQL: Tradeoffs between Scalable Performance & ConsistencyScyllaDB
This webinar compares NoSQL and NewSQL databases. We will look at the significant architectural differences between the two, tradeoffs between availability, scalable performance and consistency, data models, and share benchmark results to display the performance implications of NoSQL versus NewSQL.
When does InnoDB lock a row? Multiple rows? Why would it lock a gap? How do transactions affect these scenarios? Locking is one of the more opaque features of MySQL, but it’s very important for both developers and DBA’s to understand if they want their applications to work with high performance and concurrency. This is a creative presentation to illustrate the scenarios for locking in InnoDB and make these scenarios easier to visualize. I'll cover: key locks, table locks, gap locks, shared locks, exclusive locks, intention locks, insert locks, auto-inc locks, and also conditions for deadlocks.
MySQL and MariaDB though they share the same roots for replication .They support parallel replication , but they diverge the way the parallel replication is implemented.
Faster, better, stronger: The new InnoDBMariaDB plc
For MariaDB Enterprise Server 10.5, the default transactional storage engine, InnoDB, has been significantly rewritten to improve the performance of writes and backups. Next, we removed a number of parameters to reduce unnecessary complexity, not only in terms of configuration but of the code itself. And finally, we improved crash recovery thanks to better consistency checks and we reduced memory consumption and file I/O thanks to an all new log record format.
In this session, we’ll walk through all of the improvements to InnoDB, and dive deep into the implementation to explain how these improvements help everything from configuration and performance to reliability and recovery.
Tl;dr; How to make Apache Spark process data efficiently? Lessons learned from running petabyte scale Hadoop cluster and dozens of spark jobs’ optimisations including the most spectacular: from 2500 gigs of RAM to 240.
Apache Spark is extremely popular for processing data on Hadoop clusters. If Your Spark executors go down, an amount of memory is increased. If processing goes too slow, number of executors is increased. Well, this works for some time but sooner or later You end up with a whole cluster fully utilized in an inefficient way.
During the presentation, we will present our lessons learned and performance improvements on Spark jobs including the most spectacular: from 2500 gigs of RAM to 240. We will also answer the questions like:
- How does pySpark job differ from Scala jobs in terms of performance?
- How does caching affect dynamic resource allocation
- Why is it worth to use mapPartitions?
and many more.
DNS is critical network infrastructure and securing it against attacks like DDoS, NXDOMAIN, hijacking and Malware/APT is very important to protecting any business.
Amazon RDS with Amazon Aurora | AWS Public Sector Summit 2016Amazon Web Services
This session provides the attendee with an overview of Amazon RDS across different database types and then dives deep into the benefits and performance of Amazon Aurora.
Under the Hood of a Shard-per-Core Database ArchitectureScyllaDB
Most databases are based on architectures that pre-date advances to modern hardware. This results in performance issues, the need to overprovision, and a high total cost of ownership. In this webinar we will discuss the advances to modern server technology and take a deep dive into Scylla’s shard-per-core architecture and our asynchronous engine, the Seastar framework.
Join us to learn how Seastar (and Scylla):
Avoid locks and contention on the CPU level
Bypass kernel bottlenecks
Implement its per-core shared-nothing autosharding mechanism
Utilize modern storage hardware
Leverage NUMA to get the best RAM performance
Balance your data across CPUs and nodes for best and smoothest performance
Plus we’ll cover the advantages of unlocking vertical scalability.
Slidedeck presented at http://devternity.com/ around MongoDB internals. We review the usage patterns of MongoDB, the different storage engines and persistency models as well has the definition of documents and general data structures.
Choosing the right Professional Employer Organization will help your business remain in compliance, leverage the efficiencies of great technology, and facilitate access to comprehensive capabilities that will benefit your business and its employees. So what are the most basic and key things you need to know about PEO?
The Top Six Early Detection and Action Must-Haves for Improving OutcomesHealth Catalyst
Given the industry’s shift toward value-based, outcomes-based healthcare, organizations are working to improve outcomes. One of their top outcomes improvement priorities should be early detection and action, which can significantly improve clinical, financial, and patient experience outcomes. Through early detection and action, systems embrace a proactive approach to healthcare that aims to prevent illness; the earlier a condition is detected, the better the outcome.
But, as with most things in healthcare, improving early detection is easier said than done. This executive report provides helpful, actionable guidance about overcoming common barriers (logistical, cultural, and technical) and improving early detection and action by integrating six must-haves:
Multidisciplinary teams
Analytics
Leadership-driven culture change
Creative customization
Proof-of-concept pilot projects
In this presentation, Akka Team Lead and author Roland Kuhn presents the freshly released final specification for Reactive Streams on the JVM. This work was done in collaboration with engineers representing Netflix, Red Hat, Pivotal, Oracle, Typesafe and others to define a standard for passing streams of data between threads in an asynchronous and non-blocking fashion. This is a common need in Reactive systems, where handling streams of "live" data whose volume is not predetermined.
The most prominent issue facing the industry today is that resource consumption needs to be controlled such that a fast data source does not overwhelm the stream destination. Asynchrony is needed in order to enable the parallel use of computing resources, on collaborating network hosts or multiple CPU cores within a single machine.
Here we'll review the mechanisms employed by Reactive Streams, discuss the applicability of this technology to a variety of problems encountered in day to day work on the JVM, and give an overview of the tooling ecosystem that is emerging around this young standard.
In celebration of International Women's Day, we dug into some of our most interesting interviews with women in marketing and have put together the following slideshow highlighting some words of wisdom. Happy Women's Day!
Leading Adaptive Change to Create Value in HealthcareHealth Catalyst
In pursuit of the Triple Aim, healthcare leaders work hard to improve care, reduce costs, and improve the patient experience. But accomplishing these goals requires an engaged staff that makes progress, day in and day out. Adaptive Leadership (AL) principles help leaders understand human behavior to mobilize change and overcome work avoidance, which happens when staff operate above or below the productive zone of tension.
By understanding what adaptive work actually is (and that adaptive problems can’t be solved with technical fixes), and why work avoidance happens (because people are overwhelmed; the heat is too high), leaders can keep their teams engaged by using influence and leadership—not authority—to “lower the heat” on their people:
Validate the difficulty of the situation.
Simplify/clarify the work.
Provide additional resources (time, training, etc.)
Dr. Ulstad has worked with healthcare leaders and teams for the last 20 years to help them understand behaviors triggered by rapid, high-volume change, and apply AL principles to guide the changes critical to their organizations’ success.
How To Avoid The 3 Most Common Healthcare Analytics Pitfalls And Related Inef...Health Catalyst
Analytics are supposed to provide data-driven solutions, not additional healthcare analytics pitfalls and other related inefficiencies. Yet such issues are quite common. Becoming familiar with potential problems will help health systems avoid them in the future. The three common analytics pitfalls are point solutions, EHRs, and independent data marts located in many different databases. An EDW will counter all three of these problems. The two inefficiencies include report factories and flavor of the month projects. The solution that best overcomes these inefficiencies is a robust deployment system.
From Installed to Stalled: Why Sustaining Outcomes Improvement Requires More ...Health Catalyst
The big first step toward building an outcomes improvement program is installing the analytics platform. But it’s certainly not the only step. Sustaining healthcare outcomes improvement is a triathlon, and the three legs are:
Installing an analytics platform
Gaining adoption
Implementing best practices
The program requires buy-in, enthusiasm, even evangelizing of analytics and its tools throughout the organization. It also requires that learnings from analysis translate into best practices, otherwise the program fails to produce results and will eventually fade away. Equally important is that top-level leadership across the organization, not just IT, supports and promotes the program ongoing. We explore each of the elements and how they come together to create successful and sustainable outcomes improvement that defines leading healthcare organizations.
6 Proven Strategies for Engaging Physicians—and 4 Ways to FailHealth Catalyst
For healthcare organizations to be successful with their quality and cost improvement initiatives, physicians must be engaged with the proposed changes. But many physicians are not engaged because their morale is suffering. While some strategies to encourage buy-in for improvement initiatives don’t work, there are six strategies that have proven to be effective: (1) discover a common purpose, (2) adopt an engaging style, (3) turn physicians into partners, not customers, (4) segment the engagement plan, (5) use “engaging” improvement methods, and (6) provide them with backup—all the way to the board. Once the organization has their trust, physicians will gain enthusiasm to move forward with improvement efforts that will benefit everyone.
The 3 Must-Have Qualities of a Care Management SystemHealth Catalyst
Care management systems are defined in many ways, but the only effective system comprises three qualities:
1.) It’s comprehensive and includes a suite of tools to address all five core competencies of care management.
2.) It’s inclusive of all EMRs and other data sources to enable thorough communication and analysis.
3.) It’s analytics-driven design facilitates clinical decision making and workflow.
Ultimately, an effective system improves outcomes and becomes an indispensable tool for managing population health.
This article describes what drives successful care management, and reveals a suite of applications that aid care team members and patients through advanced algorithms and embedded analytics. Learn how technology is helping to develop appropriate interventions and improve clinical and financial outcomes.
How to Sustain Healthcare Quality Improvement in 3 Critical StepsHealth Catalyst
Many healthcare organizations don’t hold quality and cost gains because they don’t make improvement the backbone of their organization. Rather, they approach improvement as a series of initiatives. Ronald D. Snee, a fellow with the American Society for Quality states, “Many organizations focus on sustaining the gains only after improvement has been achieved. Intuitively, that may seem the correct sequence, but it is in fact backwards. The time to focus on sustaining improvement gains is well before the initiative is launched.”
Here are 3 critical organizational steps that can help sustain those gains.
Patient Flight Path Analytics: From Airline Operations to Healthcare OutcomesHealth Catalyst
We developed a predictive analytics framework for patient care based upon concepts from airline operations. Using the idea of an aircraft turnaround time where the airline wants to put the aircraft back into operation as soon as possible, we’ve created a way to help patients headed toward poor outcomes, along with their providers, “turnaround” and get the best possible, most cost-effective outcome. For example, in a diabetes patient, we might use variables such as: age, alcohol use, annual eye/foot exam, BMI, etc. to look for patterns that might influence two outcomes: 1) Diabetic control and 2) The absence of progression toward diabetic complications. The notion of our Patient Flight Path is useful at both the conceptual level, as well as the predictive algorithm implementation level.
Database vs Data Warehouse: A Comparative ReviewHealth Catalyst
What are the differences between a database and a data warehouse? A database is any collection of data organized for storage, accessibility, and retrieval. A data warehouse is a type of database the integrates copies of transaction data from disparate source systems and provisions them for analytical use. The important distinction is that data warehouses are designed to handle analytics required for improving quality and costs in the new healthcare environment. A transactional database, like an EHR, doesn’t lend itself to analytics.
Quality Improvement In Healthcare: Where Is The Best Place To Start?Health Catalyst
One of the biggest challenges providers face in their quality improvement efforts is knowing where to get started. In my experience, one of the best ways to overcome that “where do we begin?” factor is by using data from an enterprise data warehouse to look for high-cost areas where there are large variations in how health care is delivered. Variation found through the KPA is an indicator of opportunity. The more avoidable variation that is reflected in a particular care process, the more opportunity there is to reduce that variation and standardize the process. Suppose after performing a KPA you discover three areas of opportunity. How do you determine which one to pursue, especially if it’s your first journey into process improvement? The most obvious answer would seem to be the one with the largest potential ROI. That may not always be the best course to pursue, however. You will also want to take into consideration the readiness/openness to change in each of those areas.
MongoDB's architecture features built-in support for horizontal scalability, and high availability through replica sets. Auto-sharding allows users to easily distribute data across many nodes. Replica sets enable automatic failover and recovery of database nodes within or across data centers. This session will provide an introduction to scaling with MongoDB by one of MongoDB's early adopters.
MongoDB: Optimising for Performance, Scale & AnalyticsServer Density
MongoDB is easy to download and run locally but requires some thought and further understanding when deploying to production. At scale, schema design, indexes and query patterns really matter. So does data structure on disk, sharding, replication and data centre awareness. This talk will examine these factors in the context of analytics, and more generally, to help you optimise MongoDB for any scale.
Presented at MongoDB Days London 2013 by David Mytton.
What if you could get blazing fast queries on your data without having to be on call for a giant, expensive database? By picking the right file format for your data, you can store your data on disk in the cloud and still get the performance you need for modern analytics. We'll discuss benchmarks of four different data storage formats: Parquet, ORC, Avro, and traditional character-separated files like CSV. We'll cover what they are, how they work at a bits-and-bytes level, and why you might choose each one for your use case.
Elasticsearch Arcihtecture & What's New in Version 5Burak TUNGUT
General architectural concepts of Elasticsearch and what's new in version 5? Examples are prepared with our company business therefore these are excluded from presentation.
What Every Developer Should Know About Database Scalabilityjbellis
Replication. Partitioning. Relational databases. Bigtable. Dynamo. There is no one-size-fits-all approach to scaling your database, and the CAP theorem proved that there never will be. This talk will explain the advantages and limits of the approaches to scaling traditional relational databases, as well as the tradeoffs made by the designers of newer distributed systems like Cassandra. These slides are from Jonathan Ellis's OSCON 09 talk: http://en.oreilly.com/oscon2009/public/schedule/detail/7955
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...MongoDB
This will cover what to consider for high write throughput performance from hardware configuration through to the use of replica sets, multi-data centre deployments, monitoring and sharding to ensure your database is fast and stays online.
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...Glenn K. Lockwood
Comparing the burst buffers of today, such as the Cray DataWarp-based burst buffer implemented on NERSC Cori, to the proto-burst buffer deployed on SDSC's Gordon supercomputer in 2012.
Leveraging Databricks for Spark PipelinesRose Toomey
How Coatue Management saved time and money by moving Spark pipelines to Databricks.
Talk given at AWS + Databricks ML Dev Day workshop in NYC on 27 February 2020.
Similar to Optimizing MongoDB: Lessons Learned at Localytics (20)
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
2. Me
• Email: my first name @ localytics.com
• twitter.com/andrew311
• andrewrollins.com
• Founder, Chief Software Architect at Localytics
3. Localytics
• Real time analytics for mobile applications
• Built on:
– Scala
– MongoDB
– Amazon Web Services
– Ruby on Rails
– and more…
4. Why I‟m here: brain dump!
• To share tips, tricks, and gotchas about:
– Documents
– Indexes
– Fragmentation
– Migrations
– Hardware
– MongoDB on AWS
• Basic to more advanced, a compliment to
MongoDB Perf Tuning at MongoSF 2011
5. MongoDB at Localytics
• Use cases:
– Anonymous loyalty information
– De-duplication of incoming data
• Requirements:
– High throughput
– Add capacity without long down-time
• Scale today:
– Over 1 billion events tracked in May
– Thousands of MongoDB operations a second
6. Why MongoDB?
• Stability
• Community
• Support
• Drivers
• Ease of use
• Feature rich
• Scale out
9. Use BinData for UUIDs/hashes
Bad:
{
u: “21EC2020-3AEA-1069-A2DD-08002B30309D”,
// 36 bytes plus field overhead
}
Good:
{
u: BinData(0, “…”),
// 16 bytes plus field overhead
}
10. Override _id
Turn this
{
_id : ObjectId("47cc67093475061e3d95369d"),
u: BinData(0, “…”) // <- this is uniquely indexed
}
into
{
_id : BinData(0, “…”) // was the u field
}
Eliminated an extra index, but be careful about
locality... (more later, see Further Reading at end)
11. Pack „em in
• Look for cases where you can squish multiple
“records” into a single document.
• Why?
– Decreases number of index entries
– Brings documents closer to the size of a page,
alleviating potential fragmentation
• Example: comments for a blog post.
12. Prefix Indexes
Suppose you have an index on a large field, but that field doesn‟t have
many possible values. You can use a “prefix index” to greatly decrease
index size.
find({k: <kval>})
{
k: BinData(0, “…”), // 32 byte SHA256, indexed
}
into find({p: <prefix>, k: <kval>})
{
k: BinData(0, “…”), // 28 byte SHA256 suffix, not indexed
p: <32-bit integer> // first 4 bytes of k packed in integer, indexed
}
Example: git commits
14. Fragmentation
• Data on disk is memory mapped into RAM.
• Mapped in pages (4KB usually).
• Deletes/updates will cause memory
fragmentation.
Disk RAM
doc1 doc1
find(doc1) Page
deleted deleted
… …
15. New writes mingle with old data
Data
doc1
Page
Write docX docX
doc3
doc4 Page
doc5
find(docX) also pulls in old doc1, wasting RAM
16. Dealing with fragmentation
• “mongod --repair” on a secondary, swap with
primary.
• 1.9 has in-place compaction, but this still holds a
write-lock.
• MongoDB will auto-pad records.
• Pad records yourself by including and then
removing extra bytes on first insert.
– Alternative offered in SERVER-1810.
17. The Dark Side of Migrations
• Chunks are a logical construct, not physical.
• Shard keys have serious implications.
• What could go wrong?
– Let‟s run through an example.
18. Suppose the following
Chunk 1 • K is the shard key
k: 1 to 5
• K is random
Chunk 2
k: 6 to 9
Shard 1
{k: 3, …} 1st write
{k: 9, …} 2nd write
{k: 1, …} and so on
{k: 7, …}
{k: 2, …}
{k: 8, …}
21. Why is this scenario bad?
• Random reads
• Massive fragmentation
• New writes mingle with old data
22. How can we avoid bad migrations?
• Pre-split, pre-chunk
• Better shard keys for better locality
– Ideally where data in the same chunk tends to be in
the same region of disk
23. Pre-split and move
• If you know your key distribution, then pre-create
your chunks and assign them.
• See this:
– http://blog.zawodny.com/2011/03/06/mongodb-pre-
splitting-for-faster-data-loading-and-importing/
24. Better shard keys
• Usually means including a time prefix in your
shard key (e.g., {day: 100, id: X})
• Beware of write hotspots
• How to Choose a Shard Key
– http://www.snailinaturtleneck.com/blog/2011/01/04/ho
w-to-choose-a-shard-key-the-card-game/
26. Working Set in RAM
• EC2 m2.2xlarge, RAID0 setup with 16 EBS volumes.
• Workers hammering MongoDB with this loop, growing data:
– Loop { insert 500 byte record; find random record }
• Thousands of ops per second when in RAM
• Much less throughput when working set (in this case, all data
and index) grows beyond RAM.
Ops per second over time
In RAM
Not In RAM
27. Pre-fetch
• Updates hold a lock while they fetch the original
from disk.
• Instead do a read to warm the doc in RAM under
a shared read lock, then update.
28. Shard per core
• Instead of a shard per server, try a shard per
core.
• Use this strategy to overcome write locks when
writes per second matter.
• Why? Because MongoDB has one big write lock.
29. Amazon EC2
• High throughput / small working set
– RAM matters, go with high memory instances.
• Low throughput / large working set
– Ephemeral storage might be OK.
– Remember that EBS IO goes over Ethernet.
– Pay attention to IO wait time (iostat).
– Your only shot at consistent perf: use the biggest
instances in a family.
• Read this:
– http://perfcap.blogspot.com/2011/03/understanding-
and-using-amazon-ebs.html
30. Amazon EBS
• ~200 seeks per second per EBS on a good day
• EBS has *much* better random IO perf than
ephemeral, but adds a dependency
• Use RAID0
• Check out this benchmark:
– http://orion.heroku.com/past/2009/7/29/io_performanc
e_on_ebs/
• To understand how to monitor EBS:
– https://forums.aws.amazon.com/thread.jspa?messag
eID=124044
31. Further Reading
• MongoDB Performance Tuning
– http://www.scribd.com/doc/56271132/MongoDB-Performance-Tuning
• Monitoring Tips
– http://blog.boxedice.com/mongodb-monitoring/
• Markus‟ manual
– http://www.markus-gattol.name/ws/mongodb.html
• Helpful/interesting blog posts
– http://nosql.mypopescu.com/tagged/mongodb/
• MongoDB on EC2
– http://www.slideshare.net/jrosoff/mongodb-on-ec2-and-ebs
• EC2 and Ephemeral Storage
– http://www.gabrielweinberg.com/blog/2011/05/raid0-ephemeral-storage-on-aws-
ec2.html
• MongoDB Strategies for the Disk Averse
– http://engineering.foursquare.com/2011/02/09/mongodb-strategies-for-the-disk-averse/
• MongoDB Perf Tuning at MongoSF 2011
– http://www.scribd.com/doc/56271132/MongoDB-Performance-Tuning
32. Thank you.
• Check out Localytics for mobile analytics!
• Reach me at:
– Email: my first name @ localytics.com
– twitter.com/andrew311
– andrewrollins.com