Statistics is a very crucial concept in SQL Server to understand why a specific query plan was chosen to execute your query. In this slideshow, I attempt to explain some basic concepts in SQL Server Statistics.
Got a question or comment? Write to me at love@withsqlserver.com
Happy DBA'ing!
Excel can do much more than just 'hold places'. In this presentation we explore links, conditional formats, copying spreadsheets, causing text to disappear,
Being a data analyst requires a variety of skills in Excel.
If you dream of working as an analyst, but you lack a few skills in order to master MS-Excel, TEST4U Data Analyst is the ultimate know-how interactive tutor!
We have equipped TEST4
Excel can do much more than just 'hold places'. In this presentation we explore links, conditional formats, copying spreadsheets, causing text to disappear,
Being a data analyst requires a variety of skills in Excel.
If you dream of working as an analyst, but you lack a few skills in order to master MS-Excel, TEST4U Data Analyst is the ultimate know-how interactive tutor!
We have equipped TEST4
Migrating Monitoring to Observability – How to Transform DevOps from being Re...Liz Masters Lovelace
With your Digital Transformation in full swing it’s time to transform the way you look at your systems and services. With the speed of DevOps you need your Monitoring to be faster, more agile, and more accurate. You can’t afford your systems to be down. Its time to look at monitoring from a different angle. Let’s explore looking from the top down rather than the bottom up. For more information, please reach out to Craig Haessig. CraigH@mobiuspartners.com
Perform magic and save hours of time with Model Analyzer for Excel. Decide among multiple investment alternatives. Analyze multiple scenarios and how sensitive certain variables are.
This presentation is about -
Working Under Change Management,
What is change management? ,
repository types using change management
For more details Visit :-
http://vibranttechnologies.co.in/sas-classes-in-mumbai.html
Waiting too long for Excel's VLOOKUP? Use SQLite for simple data analysis!Amanda Lam
** This workshop was conducted in the Hong Kong Open Source Conference 2017 **
Excel formulas can be quite slow when you're processing data files with thousands of rows. It's also especially difficult to maintain the files when you have some messy mixture of VLOOKUPs, Pivot Tables, Macros and VBAs.
In this interactive workshop targeted for non-coders, we will make use of SQLite, a very lightweight and portable open source database library, to perform some simple and repeatable data analysis on large datasets that are publicly available. We will also explore what you can further do with the data by using some powerful extensions of SQLite.
While SQLite may not totally replace Excel in many ways, after the workshop you will find that it can improve your work efficiency and make your life much easier in so many use cases!
Who should attend this workshop?
- If you're frustrated with the slow performance of Excel formulas when dealing with large datasets in your daily work
- No coding experience is required
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...Spark Summit
In this presentation, we are going to talk about the state of the art infrastructure we have established at Walmart Labs for the Search product using Spark Streaming and DataFrames. First, we have been able to successfully use multiple micro batch spark streaming pipelines to update and process information like product availability, pick up today etc. along with updating our product catalog information in our search index to up to 10,000 kafka events per sec in near real-time. Earlier, all the product catalog changes in the index had a 24 hour delay, using Spark Streaming we have made it possible to see these changes in near real-time. This addition has provided a great boost to the business by giving the end-costumers instant access to features likes availability of a product, store pick up, etc.
Second, we have built a scalable anomaly detection framework purely using Spark Data Frames that is being used by our data pipelines to detect abnormality in search data. Anomaly detection is an important problem not only in the search domain but also many domains such as performance monitoring, fraud detection, etc. During this, we realized that not only are Spark DataFrames able to process information faster but also are more flexible to work with. One could write hive like queries, pig like code, UDFs, UDAFs, python like code etc. all at the same place very easily and can build DataFrame template which can be used and reused by multiple teams effectively. We believe that if implemented correctly Spark Data Frames can potentially replace hive/pig in big data space and have the potential of becoming unified data language.
We conclude that Spark Streaming and Data Frames are the key to processing extremely large streams of data in real-time with ease of use.
Self-serve analytics journey at Celtra: Snowflake, Spark, and DatabricksGrega Kespret
Celtra provides a platform for streamlined ad creation and campaign management used by customers including Porsche, Taco Bell, and Fox to create, track, and analyze their digital display advertising. Celtra’s platform processes billions of ad events daily to give analysts fast and easy access to reports and ad hoc analytics. Celtra’s Grega Kešpret leads a technical dive into Celtra’s data-pipeline challenges and explains how it solved them by combining Snowflake’s cloud data warehouse with Spark to get the best of both.
Topics include:
- Why Celtra changed its pipeline, materializing session representations to eliminate the need to rerun its pipeline
- How and why it decided to use Snowflake rather than an alternative data warehouse or a home-grown custom solution
- How Snowflake complemented the existing Spark environment with the ability to store and analyze deeply nested data with full consistency
- How Snowflake + Spark enables production and ad hoc analytics on a single repository of data
Migrating Monitoring to Observability – How to Transform DevOps from being Re...Liz Masters Lovelace
With your Digital Transformation in full swing it’s time to transform the way you look at your systems and services. With the speed of DevOps you need your Monitoring to be faster, more agile, and more accurate. You can’t afford your systems to be down. Its time to look at monitoring from a different angle. Let’s explore looking from the top down rather than the bottom up. For more information, please reach out to Craig Haessig. CraigH@mobiuspartners.com
Perform magic and save hours of time with Model Analyzer for Excel. Decide among multiple investment alternatives. Analyze multiple scenarios and how sensitive certain variables are.
This presentation is about -
Working Under Change Management,
What is change management? ,
repository types using change management
For more details Visit :-
http://vibranttechnologies.co.in/sas-classes-in-mumbai.html
Waiting too long for Excel's VLOOKUP? Use SQLite for simple data analysis!Amanda Lam
** This workshop was conducted in the Hong Kong Open Source Conference 2017 **
Excel formulas can be quite slow when you're processing data files with thousands of rows. It's also especially difficult to maintain the files when you have some messy mixture of VLOOKUPs, Pivot Tables, Macros and VBAs.
In this interactive workshop targeted for non-coders, we will make use of SQLite, a very lightweight and portable open source database library, to perform some simple and repeatable data analysis on large datasets that are publicly available. We will also explore what you can further do with the data by using some powerful extensions of SQLite.
While SQLite may not totally replace Excel in many ways, after the workshop you will find that it can improve your work efficiency and make your life much easier in so many use cases!
Who should attend this workshop?
- If you're frustrated with the slow performance of Excel formulas when dealing with large datasets in your daily work
- No coding experience is required
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...Spark Summit
In this presentation, we are going to talk about the state of the art infrastructure we have established at Walmart Labs for the Search product using Spark Streaming and DataFrames. First, we have been able to successfully use multiple micro batch spark streaming pipelines to update and process information like product availability, pick up today etc. along with updating our product catalog information in our search index to up to 10,000 kafka events per sec in near real-time. Earlier, all the product catalog changes in the index had a 24 hour delay, using Spark Streaming we have made it possible to see these changes in near real-time. This addition has provided a great boost to the business by giving the end-costumers instant access to features likes availability of a product, store pick up, etc.
Second, we have built a scalable anomaly detection framework purely using Spark Data Frames that is being used by our data pipelines to detect abnormality in search data. Anomaly detection is an important problem not only in the search domain but also many domains such as performance monitoring, fraud detection, etc. During this, we realized that not only are Spark DataFrames able to process information faster but also are more flexible to work with. One could write hive like queries, pig like code, UDFs, UDAFs, python like code etc. all at the same place very easily and can build DataFrame template which can be used and reused by multiple teams effectively. We believe that if implemented correctly Spark Data Frames can potentially replace hive/pig in big data space and have the potential of becoming unified data language.
We conclude that Spark Streaming and Data Frames are the key to processing extremely large streams of data in real-time with ease of use.
Self-serve analytics journey at Celtra: Snowflake, Spark, and DatabricksGrega Kespret
Celtra provides a platform for streamlined ad creation and campaign management used by customers including Porsche, Taco Bell, and Fox to create, track, and analyze their digital display advertising. Celtra’s platform processes billions of ad events daily to give analysts fast and easy access to reports and ad hoc analytics. Celtra’s Grega Kešpret leads a technical dive into Celtra’s data-pipeline challenges and explains how it solved them by combining Snowflake’s cloud data warehouse with Spark to get the best of both.
Topics include:
- Why Celtra changed its pipeline, materializing session representations to eliminate the need to rerun its pipeline
- How and why it decided to use Snowflake rather than an alternative data warehouse or a home-grown custom solution
- How Snowflake complemented the existing Spark environment with the ability to store and analyze deeply nested data with full consistency
- How Snowflake + Spark enables production and ad hoc analytics on a single repository of data
Getting to Know MySQL Enterprise MonitorMark Leith
MySQL Enterprise Monitor is the monitoring and management solution for DBAs and developers delivered as part of MySQL Enterprise Edition. It provides background monitoring, alerting, trending, and analysis of the MySQL database and the statement traffic that is running within it.
View this session to learn how to install/configure, customize, and use MySQL Enterprise Monitor to suit your environment. Whether you use a single server or have hundreds of instances, MySQL Enterprise Monitor can provide great insights into how your environment is performing.
Part2 Best Practices for Managing Optimizer StatisticsMaria Colgan
Part 2 of the SQL Tuning workshop focuses on Optimizer Statistics and the best practices for managing them, including when and how to gather statistics. It also covers what additional information you may need to give the Optimizer and provides guidance on when not to gather statistics. Finally we look at all of the techniques you can use to speed up statistics gathering including taking advantage of Incremental statistics, parallelism and concurrency.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
3. Distribution of data
When we have millions of rows in the tables, it really matters (to the database!) on how the
data is distributed. For example, it is useful to know that 35% of the employees (from the data in
an Employee table) work from France, 28% from Germany and so on.
Distribution of data really matters!
http://www.sqlserverapp.com/
4. Why does it matter so much?
So why is all this obsession about data distribution?!
Because only then SQL Server can optimize query execution.
Before it runs a query, SQL Server needs to ‘estimate’ how much of data is being fetched.
For example, ‘Is the query fetching about 10% of the data from the table?’
Or ‘Is it getting almost 90% of the data in the table?’
Based on this, the SQL Server Query Optimizer makes several choices on how to
execute the query.
Say, for example, it will decide whether or not to use a specific index during the execution.
http://www.sqlserverapp.com/
6. SQL Server uses a really cool way to track data distribution.
And thats’s what we call
Statistics
Statistics … a crucial SQL technique
http://www.sqlserverapp.com/
7. sys.stats
is our hero here!
Not that you are going to look into its data much.
But worth getting a basic hang on it when you find some time
or during your coffee break!
DMV for Statistics
https://docs.microsoft.com/en-us/sql/relational-databases/system-catalog-views/sys-stats-transact-sql
http://www.sqlserverapp.com/
8. Each Statistics object is created for one or more of the columns
in the tables and indexed views in SQL Server.
SQL Server maintains a histogram depicting the distribution of values.
Statistics objects
http://www.sqlserverapp.com/
Tidbit : What is a Histogram?
A Histogram groups data into ranges and helps display how much data is in each range.
9. When the automatic creation/updating of Statistics is enable in SQL Server, SQL Server takes a wise
call on whether it is really necessary to update the Statistics before a query is run.
It takes this call based on how much the data in the related tables have changed since the last Statistics
update.
When are they created?
http://www.sqlserverapp.com/
10. While the concept of Statistics is really cool, we need to keep in mind that it has a performance cost.
Having to maintain what distribution of data is present in every table and indexed view is not trivial.
While the need to keep the Statistics up-to-date is important, there is a question of how much up-to-
date it needs to be. Updating Statistics for, say, every insert/delete that happens in a table might be
way too costly. However, updating too less might also turn out bad because then queries will be
executed (i.e. query plans would be chosen by the Query Optimizer) using the old values in Statistics.
When are they created?
http://www.sqlserverapp.com/
12. Yes!
But again, be wary of the performance implication in maintaining a Statistic object.
Can you create a Statistic object?
http://www.sqlserverapp.com/
13. This is possible too. You can run an update of the Statistics objects when you want.
You can use the Stored Procedure sp_updatestats
Updating Statistics at your will
http://www.sqlserverapp.com/
14. This is possible too. You can run an update of the Statistics objects when you want.
You can use the Stored Procedure sp_updatestats
Updating Statistics at your will
http://www.sqlserverapp.com/
15. Updating Statistics at your will
http://www.sqlserverapp.com/
Be wary that updating
Statistics will lead to
recompilation of your queries!
16. Happy
DBA’ing!
See you soon with another interesting SQL Server concept!
Until then …
Referenced from MSDN
iKosmik
http://www.sqlserverapp.com/
Follow us to get notified on SQL Server concepts and tidbits