The document proposes optimizing DRAM caches for latency rather than hit rate. It summarizes previous work on DRAM caches like Loh-Hill Cache that treated DRAM cache similarly to SRAM cache. This led to high latency and low bandwidth utilization.
The document introduces the Alloy Cache design which avoids tag serialization to reduce latency. It also proposes a Memory Access Predictor to selectively use parallel or serial access models for low latency and bandwidth. Simulation results show Alloy Cache with a predictor outperforms SRAM-tag designs. The design provides benefits with small impact on hit rate even for large caches.
Green Printing at UK Government Department [Infographic]Chief Optimist
A UK central government department needed to cut costs and waste. Xerox and its Managed Print Services solution delivered to the tune of 30 percent print savings and 12 percent less CO2 emissions.
These days fast code needs to operate in harmony with its environment. At the deepest level this means working well with hardware: RAM, disks and SSDs. A unifying theme is treating memory access patterns in a uniform and predictable way that is sympathetic to the underlying hardware. For example writing to and reading from RAM and Hard Disks can be significantly sped up by operating sequentially on the device, rather than randomly accessing the data. In this talk we’ll cover why access patterns are important, what kind of speed gain you can get and how you can write simple high level code which works well with these kind of patterns.
Green Printing at UK Government Department [Infographic]Chief Optimist
A UK central government department needed to cut costs and waste. Xerox and its Managed Print Services solution delivered to the tune of 30 percent print savings and 12 percent less CO2 emissions.
These days fast code needs to operate in harmony with its environment. At the deepest level this means working well with hardware: RAM, disks and SSDs. A unifying theme is treating memory access patterns in a uniform and predictable way that is sympathetic to the underlying hardware. For example writing to and reading from RAM and Hard Disks can be significantly sped up by operating sequentially on the device, rather than randomly accessing the data. In this talk we’ll cover why access patterns are important, what kind of speed gain you can get and how you can write simple high level code which works well with these kind of patterns.
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...Виталий Стародубцев
##Что такое Storage Replica
##Архитектура и сценарии
##Синхронная и асинхронная репликация
##Междисковая, межсерверная, внутрикластерная и межкластерная репликация
##Дизайн и проектирование Storage Replica
##Нововведения в Windows Server 2016 TP5
##Графический интерфейс управления, и другие возможности - демонстрация и планы развития
##Интеграция Storage Replica с Storage Spaces Direct
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...Chester Chen
Machine Learning at the Limit
John Canny, UC Berkeley
How fast can machine learning and graph algorithms be? In "roofline" design, every kernel is driven toward the limits imposed by CPU, memory, network etc. This can lead to dramatic improvements: BIDMach is a toolkit for machine learning that uses rooflined design and GPUs to achieve two- to three-orders of magnitude improvements over other toolkits on single machines. These speedups are larger than have been reported for *cluster* systems (e.g. Spark/MLLib, Powergraph) running on hundreds of nodes, and BIDMach with a GPU outperforms these systems for most common machine learning tasks. For algorithms (e.g. graph algorithms) which do require cluster computing, we have developed a rooflined network primitive called "Kylix". We can show that Kylix approaches the rooline limits for sparse Allreduce, and empirically holds the record for distributed Pagerank. Beyond rooflining, we believe there are great opportunities from deep algorithm/hardware codesign. Gibbs Sampling (GS) is a very general tool for inference, but is typically much slower than alternatives. SAME (State Augmentation for Marginal Estimation) is a variation of GS which was developed for marginal parameter estimation. We show that it has high parallelism, and a fast GPU implementation. Using SAME, we developed a GS implementation of Latent Dirichlet Allocation whose running time is 100x faster than other samplers, and within 3x of the fastest symbolic methods. We are extending this approach to general graphical models, an area where there is currently a void of (practically) fast tools. It seems at least plausible that a general-purpose solution based on these techniques can closely approach the performance of custom algorithms.
Bio
John Canny is a professor in computer science at UC Berkeley. He is an ACM dissertation award winner and a Packard Fellow. He is currently a Data Science Senior Fellow in Berkeley's new Institute for Data Science and holds a INRIA (France) International Chair. Since 2002, he has been developing and deploying large-scale behavioral modeling systems. He designed and protyped production systems for Overstock.com, Yahoo, Ebay, Quantcast and Microsoft. He currently works on several applications of data mining for human learning (MOOCs and early language learning), health and well-being, and applications in the sciences.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Essentials of Automations: Optimizing FME Workflows with Parameters
Hardware managed cache
1. 3-D Memory Stacking
3-D Stacked memory can provide large caches at high bandwidth
3D Stacking for low latency and high bandwidth memory system
- E.g. Half the latency, 8x the bandwidth [Loh&Hill, MICRO’11]
Stacked DRAM: Few hundred MB, not enough for main memory
Hardware-managed cache is desirable: Transparent to software
Source: Loh and Hill MICRO’11
2. Problems in Architecting Large Caches
Architecting tag-store for low-latency and low-storage is challenging
Organizing at cache line granularity (64 B) reduces wasted space and
wasted bandwidth
Problem: Cache of hundreds of MB needs tag-store of tens of MB
E.g. 256MB DRAM cache needs ~20MB tag store (5 bytes/line)
Option 1: SRAM Tags
Fast, But Impractical
(Not enough transistors)
Option 2: Tags in DRAM
Naïve design has 2x latency
(One access each for tag, data)
3. Loh-Hill Cache Design [Micro’11, TopPicks]
Recent work tries to reduce latency of Tags-in-DRAM approach
LH-Cache design similar to traditional set-associative cache
2KB row buffer = 32 cache lines
Speed-up cache miss detection:
A MissMap (2MB) in L3 tracks lines of pages resident in DRAM cache
Miss
Map
Data lines (29-ways)Tags
Cache organization: A 29-way set-associative DRAM (in 2KB row)
Keep Tag and Data in same DRAM row (tag-store & data store)
Data access guaranteed row-buffer hit (Latency ~1.5x instead of 2x)
4. Cache Optimizations Considered Harmful
Need to revisit DRAM cache structure given widely different constraints
DRAM caches are slow Don’t make them slower
Many “seemingly-indispensable” and “well-understood” design
choices degrade performance of DRAM cache:
• Serial tag and data access
• High associativity
• Replacement update
Optimizations effective only in certain parameters/constraints
Parameters/constraints of DRAM cache quite different from SRAM
E.g. Placing one set in entire DRAM row Row buffer hit rate ≈ 0%
6. Simple Example: Fast Cache (Typical)
Optimizing for hit-rate (at expense of hit latency) is effective
Consider a system with cache: hit latency 0.1 miss latency: 1
Base Hit Rate: 50% (base average latency: 0.55)
Opt A removes 40% misses (hit-rate:70%), increases hit latency by 40%
Base Cache Opt-A
Break Even
Hit-Rate=52%
Hit-Rate A=70%
7. Simple Example: Slow Cache (DRAM)
Base Cache Opt-A
Break Even
Hit-Rate=83%
Consider a system with cache: hit latency 0.5 miss latency: 1
Base Hit Rate: 50% (base average latency: 0.75)
Opt A removes 40% misses (hit-rate:70%), increases hit latency by 40%
Hit-Rate A=70%
Optimizations that increase hit latency start becoming ineffective
8. Overview of Different Designs
Our Goal: Outperform SRAM-Tags with a simple and practical design
For DRAM caches, critical to optimize first for latency, then hit-rate
9. What is the Hit Latency Impact?
Both SRAM-Tag and LH-Cache have much higher latency ineffective
Consider Isolated accesses: X always gives row buffer hit, Y needs an row activation
10. How about Bandwidth?
LH-Cache reduces effective DRAM cache bandwidth by > 4x
Configuration Raw
Bandwidth
Transfer
Size on Hit
Effective
Bandwidth
Main Memory 1x 64B 1x
DRAM$(SRAM-Tag) 8x 64B 8x
DRAM$(LH-Cache) 8x 256B+16B 1.8x
DRAM$(IDEAL) 8x 64B 8x
For each hit, LH-Cache transfers:
• 3 lines of tags (3x64=192 bytes)
• 1 line for data (64 bytes)
• Replacement update (16 bytes)
11. Performance Potential
LH-Cache gives 8.7%, SRAM-Tag 24%, latency-optimized design 38%
8-core system with 8MB shared L3 cache at 24 cycles
DRAM Cache: 256MB (Shared), latency 2x lower than off-chip
0.6
0.8
1
1.2
1.4
1.6
1.8
Speedup(NoDRAM$)
LH-Cache SRAM-Tag IDEAL-Latency Optimized
12. De-optimizing for Performance
More benefits from optimizing for hit-latency than for hit-rate
LH-Cache uses LRU/DIP needs update, uses bandwidth
LH-Cache can be configured as direct map row buffer hits
Configuration Speedup Hit-Rate Hit-Latency
(cycles)
LH-Cache 8.7% 55.2% 107
LH-Cache + Random Repl. 10.2% 51.5% 98
LH-Cache (Direct Map) 15.2% 49.0% 82
IDEAL-LO (Direct Map) 38.4% 48.2% 35
14. Alloy Cache: Avoid Tag Serialization
Alloy Cache has low latency and uses less bandwidth
No dependent access for Tag and Data Avoids Tag serialization
Consecutive lines in same DRAM row High row buffer hit-rate
No need for separate “Tag-store” and “Data-Store” Alloy Tag+Data
One “Tag+Data”
15. 0.6
0.8
1
1.2
1.4
1.6
1.8
Performance of Alloy Cache
Alloy Cache with good predictor can outperform SRAM-Tag
Alloy+MissMap SRAM-TagAlloy+PerfectPredAlloy Cache
Speedup(NoDRAM$)
Alloy Cache with no early-miss detection gets 22%, close to SRAM-Tag
17. Cache Access Models
Each model has distinct advantage: lower latency or lower BW usage
Serial Access Model (SAM) and Parallel Access Model (PAM)
Higher Miss Latency
Needs less BW
Lower Miss Latency
Needs more BW
18. To Wait or Not to Wait?
Using Dynamic Access Model (DAM), we can get best latency and BW
Dynamic Access Model: Best of both SAM and PAM
When line likely to be present in cache use SAM, else use PAM
Memory Access
Predictor (MAP)
L3-miss
Address
Prediction =
Cache Hit
Prediction =
Memory Access
Use PAM
Use SAM
19. Memory Access Predictor (MAP)
Proposed MAP designs simple and low latency
We can use Hit Rate as proxy for MAP: High hit-rate SAM, low PAM
Accuracy improved with History-Based prediction
1. History-Based Global MAP (MAP-G)
• Single saturating counter per-core (3-bit)
• Increment on cache hit, decrement on miss
• MSB indicates SAM or PAM
Table
Of
Counters
(3-bit)
Miss PC
MAC
2. Instruction Based MAP (MAP-PC)
• Have a table of saturating counter
• Index table based on miss-causing PC
• Table of 256 entries sufficient (96 bytes)
20. 0.6
0.8
1
1.2
1.4
1.6
1.8
Predictor Performance
Simple Memory Access Predictors obtain almost all potential gains
Speedup(NoDRAM$)
Alloy+MAP-Global Alloy +MAP-PC Alloy+PerfectMAPAlloy+NoPred
Accuracy of MAP-Global: 82% Accuracy of MAP-PC: 94%
Alloy Cache with MAP-PC gets 35%, Perfect MAP gets 36.5%
21. Hit-Latency versus Hit-Rate
Latency LH-Cache SRAM-Tag Alloy Cache
Average Latency (cycles) 107 67 43
Relative Latency 2.5x 1.5x 1.0x
Cache Size LH-Cache
(29-way)
Alloy Cache
(1-way)
Delta
Hit-Rate
256MB 55.2% 48.2% 7%
512MB 59.6% 55.2% 4.4%
1GB 62.6% 59.1% 2.5%
DRAM Cache Hit Rate
Alloy Cache reduces hit latency greatly at small loss of hit-rate
DRAM Cache Hit Latency
23. Summary
DRAM Caches are slow, don’t make them slower
Previous research: DRAM cache architected similar to SRAM cache
Insight: Optimize DRAM cache first for latency, then hit-rate
Latency optimized Alloy Cache avoids tag serialization
Memory Access Predictor: simple, low latency, yet highly effective
Alloy Cache + MAP outperforms SRAM-Tags (35% vs. 24%)
Calls for new ways to manage DRAM cache space and bandwidth
Give a brief overview of the presentation. Describe the major focus of the presentation and why it is important.
Introduce each of the major topics.
To provide a road map for the audience, you can repeat this Overview slide throughout the presentation, highlighting the particular topic you will discuss next.
Give a brief overview of the presentation. Describe the major focus of the presentation and why it is important.
Introduce each of the major topics.
To provide a road map for the audience, you can repeat this Overview slide throughout the presentation, highlighting the particular topic you will discuss next.
Give a brief overview of the presentation. Describe the major focus of the presentation and why it is important.
Introduce each of the major topics.
To provide a road map for the audience, you can repeat this Overview slide throughout the presentation, highlighting the particular topic you will discuss next.
Give a brief overview of the presentation. Describe the major focus of the presentation and why it is important.
Introduce each of the major topics.
To provide a road map for the audience, you can repeat this Overview slide throughout the presentation, highlighting the particular topic you will discuss next.