The document discusses the multicore midlife crisis as processors move to multiple cores to cope with Moore's Law. As core counts increase, the memory bandwidth does not scale accordingly, creating a memory wall problem. Solutions proposed include increasing cache sizes, improving memory speeds, and better caching techniques. Future multicore designs may focus more on heterogeneous cores tailored for different workloads rather than increasing core counts uniformly. Research challenges include coping with heterogeneity, improving data locality given slow memory speeds, and software techniques to help address issues like cache coherence.
PowerPoint Presentation on the popular topic Multi Core Processors,History of multi core processors, comparison between single core and multi core processors, advantages and disadvantages of multi core processors.
Multi-core processor and Multi-channel memory architectureUmair Amjad
Content of presentation:
Multi-core processors
Multi-channel memory architecture
Comparison between single and multi channel memory
Conclusion
References
This document discusses multi-core processor architectures. It begins by explaining that multi-core processors contain multiple processor cores on a single chip or die. Each core can run threads independently and in parallel. The document then covers topics like how operating systems schedule threads across multiple cores, why multi-core architectures became prevalent, different memory models for multi-cores, and challenges like maintaining cache coherence across private caches when data is shared. It also compares multi-core designs to simultaneous multithreading approaches.
The document discusses exascale computing in the United States. It provides background on current US systems like Titan, Sequoia, and Mira. It then does a "brain dead projection" of simply scaling up these systems to exascale, which would require over 1 billion threads and memory less than 0.5GB per thread. Realizing exascale will likely require new technologies like lower power processors and accelerators. Efforts to advance exascale in the US include DARPA programs, DOE workshops, and NSF funding for cross-country collaborations on big data and extreme-scale challenges.
This document discusses multi-core processor architectures. It begins by explaining single-core processors and then introduces multi-core processors, which place multiple processor cores on a single chip. Each core can run threads independently and in parallel. The document discusses how operating systems schedule threads across multiple cores. It also covers challenges like cache coherence when multiple cores access shared memory. Overall, the document provides an overview of multi-core processors and how they exploit thread-level parallelism.
How to Actually Tune Your Spark Jobs So They WorkIlya Ganelin
This document summarizes a USF Spark workshop that covers Spark internals and how to optimize Spark jobs. It discusses how Spark works with partitions, caching, serialization and shuffling data. It provides lessons on using less memory by partitioning wisely, avoiding shuffles, using the driver carefully, and caching strategically to speed up jobs. The workshop emphasizes understanding Spark and tuning configurations to improve performance and stability.
IBM System x3850 X5 Technical Presenation abbrv.meye0611
The document provides an overview of the IBM System x3850 X5, a 4-socket, 4U rack-optimized scalable enterprise server. It maximizes memory capacity up to 1TB and performance for database and virtualization workloads. It also minimizes costs through high performance configurations and internal flash storage. The x3850 X5 provides flexible and reliable platform for compute- and memory-intensive workloads.
PowerPoint Presentation on the popular topic Multi Core Processors,History of multi core processors, comparison between single core and multi core processors, advantages and disadvantages of multi core processors.
Multi-core processor and Multi-channel memory architectureUmair Amjad
Content of presentation:
Multi-core processors
Multi-channel memory architecture
Comparison between single and multi channel memory
Conclusion
References
This document discusses multi-core processor architectures. It begins by explaining that multi-core processors contain multiple processor cores on a single chip or die. Each core can run threads independently and in parallel. The document then covers topics like how operating systems schedule threads across multiple cores, why multi-core architectures became prevalent, different memory models for multi-cores, and challenges like maintaining cache coherence across private caches when data is shared. It also compares multi-core designs to simultaneous multithreading approaches.
The document discusses exascale computing in the United States. It provides background on current US systems like Titan, Sequoia, and Mira. It then does a "brain dead projection" of simply scaling up these systems to exascale, which would require over 1 billion threads and memory less than 0.5GB per thread. Realizing exascale will likely require new technologies like lower power processors and accelerators. Efforts to advance exascale in the US include DARPA programs, DOE workshops, and NSF funding for cross-country collaborations on big data and extreme-scale challenges.
This document discusses multi-core processor architectures. It begins by explaining single-core processors and then introduces multi-core processors, which place multiple processor cores on a single chip. Each core can run threads independently and in parallel. The document discusses how operating systems schedule threads across multiple cores. It also covers challenges like cache coherence when multiple cores access shared memory. Overall, the document provides an overview of multi-core processors and how they exploit thread-level parallelism.
How to Actually Tune Your Spark Jobs So They WorkIlya Ganelin
This document summarizes a USF Spark workshop that covers Spark internals and how to optimize Spark jobs. It discusses how Spark works with partitions, caching, serialization and shuffling data. It provides lessons on using less memory by partitioning wisely, avoiding shuffles, using the driver carefully, and caching strategically to speed up jobs. The workshop emphasizes understanding Spark and tuning configurations to improve performance and stability.
IBM System x3850 X5 Technical Presenation abbrv.meye0611
The document provides an overview of the IBM System x3850 X5, a 4-socket, 4U rack-optimized scalable enterprise server. It maximizes memory capacity up to 1TB and performance for database and virtualization workloads. It also minimizes costs through high performance configurations and internal flash storage. The x3850 X5 provides flexible and reliable platform for compute- and memory-intensive workloads.
The understanding of .NET Memory Management goes from the basics of how Windows memory works to the physical memory layout and allocation. This presentations covers both using Visual Studio IDE as main workplace.
This document discusses storage systems and I/O performance. It covers various types of storage devices like magnetic disks, optical disks, magnetic tapes, and using RAM for storage. It describes trends in magnetic disks like capacity doubling yearly but seek time improving slowly. It also discusses I/O buses, CPU to device interfaces, reliability measures like MTTF and MTTR, fault classification, and disk fault tolerance techniques like different RAID levels.
Designing for performance.
Performance balance.
single processor and Multi core processor.
Usage of Processors.
Usage of single processor and Multi core Processor.
Processing Techniques.
Moors law
Apache Direct Memory is an open source implementation of off-heap caching that uses ByteBuffer.allocateDirect to store objects in off-heap memory without degrading JVM performance. It provides a multi-layered caching solution and can be used to build a standalone cache server similar to Memcached. Current use cases include integrating with Ehcache for multi-level caching and implementing an off-heap output stream to process streaming data without filling heap memory. Future work includes benchmarking, improving the API, and integrating with more libraries.
1) The document discusses 7 deadly sins of database performance including overallocating memory, using default configurations, wasting storage space, not understanding indexing properly, and having a bad backup strategy.
2) Some key points are overallocating memory like buffers can cause swapping, using the default MyISAM storage engine is outdated, purging old unnecessary data saves space, choosing optimal data types reduces storage usage, and indexes on every column slow down operations.
3) Backup strategies should not rely on slave servers and backups need to be tested regularly to ensure quick restore times.
This document provides an overview of natural language processing (NLP). It defines NLP, discusses common NLP tasks such as part-of-speech tagging and machine translation, and explains why NLP is challenging due to various ambiguities in natural language. The document also briefly discusses related fields like linguistics, machine learning, and information retrieval, and concludes by noting that it only covers an introduction to NLP and does not discuss solutions or the current state of the field.
This document provides an overview of natural language processing (NLP). It begins with examples of NLP applications like translation and question answering. It then discusses the backgrounds in artificial intelligence, linguistics, and the web. The document outlines several common NLP tasks like part-of-speech tagging, named-entity recognition, word sense disambiguation, and parsing. It also discusses challenges like ambiguity in natural language. The document concludes with a discussion of why NLP is difficult due to ambiguity at both the linguistic and acoustic levels.
CSTalks-Sensor-Rich Mobile Video Indexing and Search-17Augcstalks
The document describes a system called GeoVid for managing and searching geo-referenced video content. GeoVid uses sensors in mobile devices to automatically collect location, direction, and other contextual data while recording video. This sensor data is associated with video frames to model the viewable scene. GeoVid then indexes and stores this sensor-enriched video data to support spatiotemporal search and retrieval of relevant video segments by location, trajectory, or other metadata. The system aims to improve searchability of user-generated video through automated geo-tagging without requiring manual user input.
9/24/2011 28
Approach – Search (2)
• Index FOV models using R-tree
This document discusses visualizing software behavior through system traces. It describes how system traces can contain huge amounts of information about software execution but are difficult for humans to understand. The author introduces a tool called WinResMon that records Windows system events and traces. Various techniques for visualizing these traces are presented, including dot plots, histograms, and barcodes, which help identify patterns, anomalies, and differences between traces to better understand software behavior. Examples analyzing file copying, a software build, and idle machines are provided.
The document discusses polymorphic heterogeneous multi-core systems as a solution to limitations in instruction-level parallelism (ILP) and thread-level parallelism (TLP) approaches for improving single-core performance. It proposes an architecture with cores that can dynamically reconfigure their internal structure and collaborate to best match software requirements. The cores are connected to a reconfigurable fabric that implements custom instructions to further speed up programs. Experimental results show this approach achieves speedups and better load balancing compared to homogeneous multi-core systems. Future work is needed to study overhead and implement dynamic scheduling.
The document proposes a framework for recommendations based on analyzing relationships between users, items, tags, and ratings (quaternary relationships). It models these relationships using a 4-order tensor and applies Higher-Order Singular Value Decomposition (HOSVD) to reveal latent semantic associations. This allows generating recommendations for users, items, tags, and predicting ratings. Experimental results on a movie dataset show the proposed quaternary approach outperforms methods using only ternary relationships.
CSTalks - Object detection and tracking - 25th Maycstalks
Object detection is a fundamental step in most of the video analysis applications. There are many research challenges involved in automatic object detection, depending on different scenarios. The most prevalent application of object detection is in the field of multimedia surveillance. In this talk we will discuss the common problems in the object detection in a surveillance video. Further, we will discuss the Gaussian Mixture Model (GMM) based object detection method. While object detection is the basic step of video analysis, higher level semantic interpretation of the scene requires trajectory information. Most of the suspicious event detection methods use tracking as the basic building block. In the second part of the talk, we will discuss particle filter based method of object tracking. To summarize, the aim of the talk is two-fold: (1) Discuss common problems in object detection and tracking (2) Hands on experience of how to use classical methods of GMM and particle filtering in problem solving.
This document summarizes machine learning concepts including supervised and unsupervised learning techniques. It discusses fundamental questions in machine learning like how to build systems that improve with experience. Key problems covered include classification, regression, and clustering. Challenges like overfitting, model complexity, and optimization techniques like gradient descent are also summarized. Open problems in machine learning like transferring learned knowledge between tasks and preserving privacy in data mining are mentioned.
CSTalks - Real movie recommendation - 9 Marcstalks
This document proposes a new approach to movie recommendation that considers temporal dynamics and local user ratings. The current best approach is collaborative filtering with temporal dynamics, but this new approach clusters users based on their individual monitoring and behavior over time. It also clusters movies based on their global and dynamic class ratings. The model would monitor users, user-user patterns, user-movie patterns, and movie-movie patterns over time to update recommendations and predictions. This is aimed to provide more accurate recommendations by considering how user preferences can change over time.
The document discusses career paths after obtaining a PhD, including academia, industry, and other options. It provides an outline of the typical academic career path, from postdoc to obtaining assistant professor positions. Interview tips are given for applying to assistant professor roles. Pursuing industry careers or entrepreneurship are also addressed. The conclusion emphasizes the importance of publishing, networking, and maintaining high quality research throughout the PhD and beyond.
CSTalks - Music Information Retrieval - 23 Febcstalks
The document discusses similarity measures used in music information retrieval systems. It defines music information retrieval as searching for music objects using musical queries. Some applications of MIR discussed are music search and recommendation. The document outlines different methods for calculating musical similarity, including text-based, audio feature-based, semantic concept-based, and multimodal fusion approaches. It concludes by noting future directions for similarity measures in MIR.
This document discusses the past, present, and future of peer-to-peer (P2P) networks. It describes how P2P networks emerged to help alleviate load on servers and make use of the interconnectivity between nodes. Popular early P2P file sharing applications like Napster, Gnutella, and BitTorrent are discussed. While P2P file sharing declined due to legal issues, P2P protocols now see growing use for media streaming, communication, and within data centers. The document predicts P2P will continue evolving with technologies like IPv6 and play an increasing role in areas like social networks and peer production.
The document summarizes a presentation on Named Data Networking (NDN) given by Mostafa Rezazad. It discusses the motivation for NDN, which is to make data and services rather than locations the primary objects on the network. This allows for benefits like redundancy elimination, easier mobility, and more inherent security. An overview is provided of NDN's packet types, node structure, name structure, and routing approach.
The document summarizes a presentation on modeling and verifying timed concurrent systems. It introduces a simple coffee vending machine model as an example and specifies properties of it using temporal logic. Timed automata are discussed as a way to model systems with timing parameters. The talk will cover extending the coffee machine model to make it parametric and synthesizing parameters to satisfy given properties.
This document discusses research in GPU computing. It provides an introduction to GPU computing, including how GPUs were originally for graphics processing but are now used more broadly through frameworks like CUDA and OpenCL. It discusses advantages of GPUs like their large number of cores compared to CPUs. Open problems in the field are also outlined, such as developing new data structures and algorithms suitable for massive parallelism. The document suggests GPU computing will continue growing in importance as computing moves towards more highly multithreaded architectures.
More Related Content
Similar to CSTalks - The Multicore Midlife Crisis - 30 Mar
The understanding of .NET Memory Management goes from the basics of how Windows memory works to the physical memory layout and allocation. This presentations covers both using Visual Studio IDE as main workplace.
This document discusses storage systems and I/O performance. It covers various types of storage devices like magnetic disks, optical disks, magnetic tapes, and using RAM for storage. It describes trends in magnetic disks like capacity doubling yearly but seek time improving slowly. It also discusses I/O buses, CPU to device interfaces, reliability measures like MTTF and MTTR, fault classification, and disk fault tolerance techniques like different RAID levels.
Designing for performance.
Performance balance.
single processor and Multi core processor.
Usage of Processors.
Usage of single processor and Multi core Processor.
Processing Techniques.
Moors law
Apache Direct Memory is an open source implementation of off-heap caching that uses ByteBuffer.allocateDirect to store objects in off-heap memory without degrading JVM performance. It provides a multi-layered caching solution and can be used to build a standalone cache server similar to Memcached. Current use cases include integrating with Ehcache for multi-level caching and implementing an off-heap output stream to process streaming data without filling heap memory. Future work includes benchmarking, improving the API, and integrating with more libraries.
1) The document discusses 7 deadly sins of database performance including overallocating memory, using default configurations, wasting storage space, not understanding indexing properly, and having a bad backup strategy.
2) Some key points are overallocating memory like buffers can cause swapping, using the default MyISAM storage engine is outdated, purging old unnecessary data saves space, choosing optimal data types reduces storage usage, and indexes on every column slow down operations.
3) Backup strategies should not rely on slave servers and backups need to be tested regularly to ensure quick restore times.
Similar to CSTalks - The Multicore Midlife Crisis - 30 Mar (6)
This document provides an overview of natural language processing (NLP). It defines NLP, discusses common NLP tasks such as part-of-speech tagging and machine translation, and explains why NLP is challenging due to various ambiguities in natural language. The document also briefly discusses related fields like linguistics, machine learning, and information retrieval, and concludes by noting that it only covers an introduction to NLP and does not discuss solutions or the current state of the field.
This document provides an overview of natural language processing (NLP). It begins with examples of NLP applications like translation and question answering. It then discusses the backgrounds in artificial intelligence, linguistics, and the web. The document outlines several common NLP tasks like part-of-speech tagging, named-entity recognition, word sense disambiguation, and parsing. It also discusses challenges like ambiguity in natural language. The document concludes with a discussion of why NLP is difficult due to ambiguity at both the linguistic and acoustic levels.
CSTalks-Sensor-Rich Mobile Video Indexing and Search-17Augcstalks
The document describes a system called GeoVid for managing and searching geo-referenced video content. GeoVid uses sensors in mobile devices to automatically collect location, direction, and other contextual data while recording video. This sensor data is associated with video frames to model the viewable scene. GeoVid then indexes and stores this sensor-enriched video data to support spatiotemporal search and retrieval of relevant video segments by location, trajectory, or other metadata. The system aims to improve searchability of user-generated video through automated geo-tagging without requiring manual user input.
9/24/2011 28
Approach – Search (2)
• Index FOV models using R-tree
This document discusses visualizing software behavior through system traces. It describes how system traces can contain huge amounts of information about software execution but are difficult for humans to understand. The author introduces a tool called WinResMon that records Windows system events and traces. Various techniques for visualizing these traces are presented, including dot plots, histograms, and barcodes, which help identify patterns, anomalies, and differences between traces to better understand software behavior. Examples analyzing file copying, a software build, and idle machines are provided.
The document discusses polymorphic heterogeneous multi-core systems as a solution to limitations in instruction-level parallelism (ILP) and thread-level parallelism (TLP) approaches for improving single-core performance. It proposes an architecture with cores that can dynamically reconfigure their internal structure and collaborate to best match software requirements. The cores are connected to a reconfigurable fabric that implements custom instructions to further speed up programs. Experimental results show this approach achieves speedups and better load balancing compared to homogeneous multi-core systems. Future work is needed to study overhead and implement dynamic scheduling.
The document proposes a framework for recommendations based on analyzing relationships between users, items, tags, and ratings (quaternary relationships). It models these relationships using a 4-order tensor and applies Higher-Order Singular Value Decomposition (HOSVD) to reveal latent semantic associations. This allows generating recommendations for users, items, tags, and predicting ratings. Experimental results on a movie dataset show the proposed quaternary approach outperforms methods using only ternary relationships.
CSTalks - Object detection and tracking - 25th Maycstalks
Object detection is a fundamental step in most of the video analysis applications. There are many research challenges involved in automatic object detection, depending on different scenarios. The most prevalent application of object detection is in the field of multimedia surveillance. In this talk we will discuss the common problems in the object detection in a surveillance video. Further, we will discuss the Gaussian Mixture Model (GMM) based object detection method. While object detection is the basic step of video analysis, higher level semantic interpretation of the scene requires trajectory information. Most of the suspicious event detection methods use tracking as the basic building block. In the second part of the talk, we will discuss particle filter based method of object tracking. To summarize, the aim of the talk is two-fold: (1) Discuss common problems in object detection and tracking (2) Hands on experience of how to use classical methods of GMM and particle filtering in problem solving.
This document summarizes machine learning concepts including supervised and unsupervised learning techniques. It discusses fundamental questions in machine learning like how to build systems that improve with experience. Key problems covered include classification, regression, and clustering. Challenges like overfitting, model complexity, and optimization techniques like gradient descent are also summarized. Open problems in machine learning like transferring learned knowledge between tasks and preserving privacy in data mining are mentioned.
CSTalks - Real movie recommendation - 9 Marcstalks
This document proposes a new approach to movie recommendation that considers temporal dynamics and local user ratings. The current best approach is collaborative filtering with temporal dynamics, but this new approach clusters users based on their individual monitoring and behavior over time. It also clusters movies based on their global and dynamic class ratings. The model would monitor users, user-user patterns, user-movie patterns, and movie-movie patterns over time to update recommendations and predictions. This is aimed to provide more accurate recommendations by considering how user preferences can change over time.
The document discusses career paths after obtaining a PhD, including academia, industry, and other options. It provides an outline of the typical academic career path, from postdoc to obtaining assistant professor positions. Interview tips are given for applying to assistant professor roles. Pursuing industry careers or entrepreneurship are also addressed. The conclusion emphasizes the importance of publishing, networking, and maintaining high quality research throughout the PhD and beyond.
CSTalks - Music Information Retrieval - 23 Febcstalks
The document discusses similarity measures used in music information retrieval systems. It defines music information retrieval as searching for music objects using musical queries. Some applications of MIR discussed are music search and recommendation. The document outlines different methods for calculating musical similarity, including text-based, audio feature-based, semantic concept-based, and multimodal fusion approaches. It concludes by noting future directions for similarity measures in MIR.
This document discusses the past, present, and future of peer-to-peer (P2P) networks. It describes how P2P networks emerged to help alleviate load on servers and make use of the interconnectivity between nodes. Popular early P2P file sharing applications like Napster, Gnutella, and BitTorrent are discussed. While P2P file sharing declined due to legal issues, P2P protocols now see growing use for media streaming, communication, and within data centers. The document predicts P2P will continue evolving with technologies like IPv6 and play an increasing role in areas like social networks and peer production.
The document summarizes a presentation on Named Data Networking (NDN) given by Mostafa Rezazad. It discusses the motivation for NDN, which is to make data and services rather than locations the primary objects on the network. This allows for benefits like redundancy elimination, easier mobility, and more inherent security. An overview is provided of NDN's packet types, node structure, name structure, and routing approach.
The document summarizes a presentation on modeling and verifying timed concurrent systems. It introduces a simple coffee vending machine model as an example and specifies properties of it using temporal logic. Timed automata are discussed as a way to model systems with timing parameters. The talk will cover extending the coffee machine model to make it parametric and synthesizing parameters to satisfy given properties.
This document discusses research in GPU computing. It provides an introduction to GPU computing, including how GPUs were originally for graphics processing but are now used more broadly through frameworks like CUDA and OpenCL. It discusses advantages of GPUs like their large number of cores compared to CPUs. Open problems in the field are also outlined, such as developing new data structures and algorithms suitable for massive parallelism. The document suggests GPU computing will continue growing in importance as computing moves towards more highly multithreaded architectures.
6. So What?
Yeap, they improved the cache size. Do I care?
The interesting part is why they did it.
5/4/11 6
7. The Memory Problem
• Moore’s Law: the number Processor
of transistors double
Core Core Core Core
every 18 months
– Singlecore: new transistors
= faster speed
– Multicore: new transistors Cache
= more cores
• Memory speed increase
Memory
does not obey Moore’s
Law!
5/4/11 7
8. The Memory Problem
• Problem: More cores compete for same slow
memory!
• Implications:
IF IF ID Queue
ID ID
X Stalled!
M access to cache
or RAM
W
J 5 cycles L > 100 cycles
5/4/11 8
9. The Memory Problem
• Problem: More cores compete for same slow
memory!
• Solution: Increase cache size J
– Maintain cache hit rate
• 2x cache hit rate requires 4x cache size
• Exponential increase in #transistors need
– Cache coherence overhead
5/4/11 9
10. Increasing Cache Size
Not practical!
B. M. Rogers et al. Scaling the bandwidth wall: challenges in and avenues for CMP scaling. ISCA 2009
5/4/11 10
12. Do We Need All These Cores?
• Average utilization: < 20%
• We don’t have too many parallel apps
• We just have enough compute power
• Until you try to encode an HD video
– Star Trek holodecks: not there yet
• CPU vendors still have to make a living
5/4/11 12
14. Tomorrow’s Multicore
• Intel Core i3, i5, i7
– Video is integrated into CPU
– Must balance sequential and parallel performance
– Lower energy requirements than prev. generations
• Heterogeneous cores
– Many, slow, good at floating points
– Some general purpose cores
– “Combine” cores into super-cores
• Must live with the memory problems
5/4/11 14
15. Tomorrow’s Multicore
• The number of cores is becoming less
important
– They can’t keep increasing them
– i3, i5, i7: how many cores each?
5/4/11 15
17. Tomorrow’s Multicore
• The number of cores is becoming less
important
– They can’t keep increasing them
– i3, i5, i7: how many cores each?
• Important is what the system provides
– FLOP intensive: GPU-style cores
– I/O intensive: FAWN (CMU)
– Memory intensive: Opteron/Xeon NUMA servers
5/4/11 17
18. A Research Perspective
• Coping with heterogeneity is hard
– Different degrees of parallelism have different
sequential executions speeds
– Many tradeoffs: Speed vs. Energy vs. Memory
intensity vs. I/O intensity
• Need models for heterogeneity
– Understand the cost of the applications in terms
of FLOPS, INTOPS, memory, I/O etc.
• Silver lining: stick to sequential apps (?)
5/4/11 18
19. A Research Perspective
• Coping with slow memory
• Need to improve data locality by orders of
magnitude
• Compiler support, auto-tunners etc.
• Space-efficient data types:
• HOT area in algo & systems
• Bloom filters: NSDI’10: 3 papers!
• Succinct data structures: STOC’08-STOC’10
• Cache oblivious algorithms
5/4/11 19
20. A Research Perspective
• Software-helped cache coherence
– Or go without it J
• Renounce some programming patterns
• Java initializes all objects to some value…
• Rethink those hash tables
• Go for approximate solutions
– It’s better if you can provide error bounds
5/4/11 20
21. Discussion
Thank you for your attention
5/4/11 21