IBM Platform LSF provides an intelligent architecture for scheduling workloads across technical computing clusters. Its modular design separates key elements like scheduling policies and resource management. LSF uses a master-slave model where the master node intelligently schedules jobs to slave nodes based on workload requirements and node resource availability. Core components like the LIM, PIM, and RES help distribute work across heterogeneous resources, while the LSF scheduler supports multiple concurrent policies aligned with business needs. Together this architecture optimizes shared resource utilization to maximize productivity.
Abstract - Various aspects of three proposed architectures for distributed software are examined. A Crucial need to
create an ideal model for optimal architecture which meets the needs of the organization for flexibility, extensibility
and integration, to fulfill exhaustive performance for potential talents processes and opportunities in the corporations
a permanent and ongoing need. The excellence of the proposed architecture is demonstrated by presenting a rigor scenario based proof of adaptively and compatibility of the architecture in cases of merging and varying organizations, where the whole structure of hierarchies is revised.
Keywords: ERP, Data-centric architecture, architecture Component-based, Plug in architecture, distributed systems
PROGNOZ Platform is a fully integrated BI platform that provides easy-to-use tools for constructing business applications with a broad range of analytic capabilities.
Key features:
1. One of the best ever designs for a BI platform
2. Entirely new user-friendly interface
3. Advanced Web-based capabilities for creating reports, dashboards, and scorecards using different kinds of visualization tools (charts, maps, gauges)
4. Modeling, forecasting, and time series analysis
5. Enhanced mobile capabilities (Apple iOS)
6. Collaboration tools: Integration with social media (Facebook, Twitter, and so on)
7. Portal integration tools (Microsoft SharePoint, SAP NetWeaver, IBM WebSphere)
8. Cross-platform application server (Windows and Linux)
9. Enhanced mapping visualizations with Web mapping services (Google Maps, Microsoft Bing, OpenStreetMap, ArcGIS) and 3D technology
10. Common security, metadata, administration, portal integration, object model, and query engine for all platform components
We compare the traditional ETL approach to the newer Business Rules-driven E-LT paradigm, the answer whether conventional ETL tools should be considered obsolete and phased out of the Enterprise Architecture, and tools based on Business Rules and E-LT take their place.
API Enablement on Mainframes. How to API enable mainframe applications & services. How to integrated mainframe services and applications to mobile, cloud and external apps. This white paper covers couple of patterns to API enable mainframe based applications and services.
Abstract - Various aspects of three proposed architectures for distributed software are examined. A Crucial need to
create an ideal model for optimal architecture which meets the needs of the organization for flexibility, extensibility
and integration, to fulfill exhaustive performance for potential talents processes and opportunities in the corporations
a permanent and ongoing need. The excellence of the proposed architecture is demonstrated by presenting a rigor scenario based proof of adaptively and compatibility of the architecture in cases of merging and varying organizations, where the whole structure of hierarchies is revised.
Keywords: ERP, Data-centric architecture, architecture Component-based, Plug in architecture, distributed systems
PROGNOZ Platform is a fully integrated BI platform that provides easy-to-use tools for constructing business applications with a broad range of analytic capabilities.
Key features:
1. One of the best ever designs for a BI platform
2. Entirely new user-friendly interface
3. Advanced Web-based capabilities for creating reports, dashboards, and scorecards using different kinds of visualization tools (charts, maps, gauges)
4. Modeling, forecasting, and time series analysis
5. Enhanced mobile capabilities (Apple iOS)
6. Collaboration tools: Integration with social media (Facebook, Twitter, and so on)
7. Portal integration tools (Microsoft SharePoint, SAP NetWeaver, IBM WebSphere)
8. Cross-platform application server (Windows and Linux)
9. Enhanced mapping visualizations with Web mapping services (Google Maps, Microsoft Bing, OpenStreetMap, ArcGIS) and 3D technology
10. Common security, metadata, administration, portal integration, object model, and query engine for all platform components
We compare the traditional ETL approach to the newer Business Rules-driven E-LT paradigm, the answer whether conventional ETL tools should be considered obsolete and phased out of the Enterprise Architecture, and tools based on Business Rules and E-LT take their place.
API Enablement on Mainframes. How to API enable mainframe applications & services. How to integrated mainframe services and applications to mobile, cloud and external apps. This white paper covers couple of patterns to API enable mainframe based applications and services.
This presentation provides a high-level overview of BPM and where it is today.
It also touches on some of the core technologies and standards.
Its focus is on the four specific “Challenges” facing BPM and they are aligned to the four phases of the typical application development life cycle.
1. Discovery
2. Design
3. Development
4. Deployment
ERP software is good for reconciling financials, creating sales forecasts, maintaining order volumes and increasing customer satisfaction. Yes, it can be done using 4-5 different platforms, but managing the data in one place is easier than in several places.
The Evolution of Enterprise Resource Planning SystemsIJAEMSJORNAL
Management of organizations needs efficient information systems to improve competitiveness by cost reduction and better logistics. It is universally recognized by large and small to medium-size enterprises (SME) that the capability of providing the right information at the right time brings tremendous rewards to organizations in a global competitive world of complex business practices. ERP (Enterprise Resource Planning) can be defined as a framework for organizing, defining and standardizing the business processes necessary to effectively plan and control an organization so the organization can use its internal knowledge to seek external advantage. This paper presents the growth and success of ERP adoption and development through history. The evolution of ERP systems closely followed the spectacular developments in the field of computer hardware and software systems. There is still a never-ending process on the ERP market, of reengineering and development, bringing new products and solutions. The consolidations continue to occur and the key players continue to build out their products. The next phase of ERP systems will be the merged products.
In this presentation we will discuss the basics of ERP system and its structure. We will also talk about client server functions.
To know more about Welingkar School’s Distance Learning Program and courses offered, visit: http://www.welingkaronline.org/distance-learning/online-mba.html
Webinar: Large Scale Graph Processing with IBM Power Systems & Neo4jNeo4j
We live in a profoundly connected world. From supply chains to payment networks to digital business and complex portfolios, our ability to understand and navigate not just data, but relationships inside the data, play an increasingly important role in all aspects of business. Highly connected value chains that generate massive volumes of connected data create an opportunity for graph analysis, which Gartner describes as "the single most single most effective competitive differentiator for organizations pursuing data-driven operations and decisions." This talk will introduce the power of graph databases and share how the latest IBM Power Systems offerings featuring the POWER8 processor and CAPI-attached Flash enable unique scaling, performance and price-performance advantages for Neo4j workloads.
The Neo4j Graph database was lacking a declarative query language.
We wanted to add a humane query language which is easy to read and understand. It borrows on other languages like SQL and SPARQL but brings it it's own flavor. Cypher uses ASCII ART to describe graph patterns that you're looking for.
We used Scala's parser combinator library in combination with functional approaches and lazy evaluation to develop the Cypher query language.
The talk describes the internals of the Cypher implementation.
Deploying Massive Scale Graphs for Realtime InsightsNeo4j
Graph databases have been at the forefront of helping organizations manage and generate insights from data relationships, and applying those insights in real-time to drive competitive advantage. As organizations gain value in deploying graph databases, the data volumes managed are growing exponentially pushing the limits of large-scale in-memory graph processing. Neo4j and IBM Power Systems combined forces to deliver a market leading scalable graph database platform capable of affordably storing and processing graphs of extremely large size and offering real-time insights, using flash and FPGA accelerators. In this session we will cover the use cases driving the need for this extremely scalable platform and how this platform offers an easy to deploy model for extreme scale graph databases.
This presentation introduces the graph model as obvious choice for rich and connected data. Graph Databases are a category of open-source NoSQL datastores which are specialized in storing, handling and querying graph structures efficiently.
Use cases represent the applicability of the graph model across many domains.
Neo4j as the most widely used graph database supports the property graph model, which is explained in detail.
To query a graph database a powerful and expressive but also friendly and easily understandable query language that is tailored for graph patterns is key. Neo4j's Cypher is such a query language developed from the ground up to support expressing challenging use-cases in a comprehensive way.
A series of examples rounds up the presentation to apply the lessons learned.
This presentation provides a high-level overview of BPM and where it is today.
It also touches on some of the core technologies and standards.
Its focus is on the four specific “Challenges” facing BPM and they are aligned to the four phases of the typical application development life cycle.
1. Discovery
2. Design
3. Development
4. Deployment
ERP software is good for reconciling financials, creating sales forecasts, maintaining order volumes and increasing customer satisfaction. Yes, it can be done using 4-5 different platforms, but managing the data in one place is easier than in several places.
The Evolution of Enterprise Resource Planning SystemsIJAEMSJORNAL
Management of organizations needs efficient information systems to improve competitiveness by cost reduction and better logistics. It is universally recognized by large and small to medium-size enterprises (SME) that the capability of providing the right information at the right time brings tremendous rewards to organizations in a global competitive world of complex business practices. ERP (Enterprise Resource Planning) can be defined as a framework for organizing, defining and standardizing the business processes necessary to effectively plan and control an organization so the organization can use its internal knowledge to seek external advantage. This paper presents the growth and success of ERP adoption and development through history. The evolution of ERP systems closely followed the spectacular developments in the field of computer hardware and software systems. There is still a never-ending process on the ERP market, of reengineering and development, bringing new products and solutions. The consolidations continue to occur and the key players continue to build out their products. The next phase of ERP systems will be the merged products.
In this presentation we will discuss the basics of ERP system and its structure. We will also talk about client server functions.
To know more about Welingkar School’s Distance Learning Program and courses offered, visit: http://www.welingkaronline.org/distance-learning/online-mba.html
Webinar: Large Scale Graph Processing with IBM Power Systems & Neo4jNeo4j
We live in a profoundly connected world. From supply chains to payment networks to digital business and complex portfolios, our ability to understand and navigate not just data, but relationships inside the data, play an increasingly important role in all aspects of business. Highly connected value chains that generate massive volumes of connected data create an opportunity for graph analysis, which Gartner describes as "the single most single most effective competitive differentiator for organizations pursuing data-driven operations and decisions." This talk will introduce the power of graph databases and share how the latest IBM Power Systems offerings featuring the POWER8 processor and CAPI-attached Flash enable unique scaling, performance and price-performance advantages for Neo4j workloads.
The Neo4j Graph database was lacking a declarative query language.
We wanted to add a humane query language which is easy to read and understand. It borrows on other languages like SQL and SPARQL but brings it it's own flavor. Cypher uses ASCII ART to describe graph patterns that you're looking for.
We used Scala's parser combinator library in combination with functional approaches and lazy evaluation to develop the Cypher query language.
The talk describes the internals of the Cypher implementation.
Deploying Massive Scale Graphs for Realtime InsightsNeo4j
Graph databases have been at the forefront of helping organizations manage and generate insights from data relationships, and applying those insights in real-time to drive competitive advantage. As organizations gain value in deploying graph databases, the data volumes managed are growing exponentially pushing the limits of large-scale in-memory graph processing. Neo4j and IBM Power Systems combined forces to deliver a market leading scalable graph database platform capable of affordably storing and processing graphs of extremely large size and offering real-time insights, using flash and FPGA accelerators. In this session we will cover the use cases driving the need for this extremely scalable platform and how this platform offers an easy to deploy model for extreme scale graph databases.
This presentation introduces the graph model as obvious choice for rich and connected data. Graph Databases are a category of open-source NoSQL datastores which are specialized in storing, handling and querying graph structures efficiently.
Use cases represent the applicability of the graph model across many domains.
Neo4j as the most widely used graph database supports the property graph model, which is explained in detail.
To query a graph database a powerful and expressive but also friendly and easily understandable query language that is tailored for graph patterns is key. Neo4j's Cypher is such a query language developed from the ground up to support expressing challenging use-cases in a comprehensive way.
A series of examples rounds up the presentation to apply the lessons learned.
Introducing Neo4j 3.1: New Security and Clustering Architecture Neo4j
Neo4j 3.1, now in public beta, introduces many new exciting features. It improves upon existing security features to provide enterprise class user management, including role based authentication and AD/LDAP integration. The release introduces a new clustering architecture called Causal Clustering that enables very large clusters of Neo4j to be deployed across data centers while maintaining the data integrity that is is critical for the property graph model. Other highlights include database kernel and operations advances, user defined functions, a new Cypher command line interface, and Neo4j Browser improvements.
In this webinar we will cover these new features in detail, including a live demo where we will show how to deploy a Neo4j 3.1 cluster and manage users using the new security features.
Knowledge Architecture: Graphing Your KnowledgeNeo4j
Ask any project manager and they will tell you the importance of reviewing lessons learned prior to starting a new project. The lesson learned databases are filled with nuggets of valuable information to help project teams increase the likelihood of project success. Why then do most lesson learned databases go unused by project teams? In my experience, they are difficult to search through and require hours of time to review the result set.
Recently I had a project engineer ask me if we could search our lessons learned using a list of 22 key terms the team was interested in. Our current keyword search engine would require him to enter each term individually, select the link, and save the document for review. Also, there was no way to search only the database, the query would search our entire corpus, close to 20 million URLs. This would not do. I asked our search team if they would run a special query against the lesson database only, using the terms provided. They returned a spreadsheet with a link to each document containing the terms. The engineer had his work cut out for him: over 1100 documents were on the list;.
I started thinking there had to be a better way. I had been experimenting with topic modeling, in particular to assist our users in connecting seemingly disparate documents through an easier visualization mechanism. Something better than a list of links on multiple pages. I gathered my toolbox: R/RStudio, for the topic modeling and exploring the data; Neo4j, for modeling and visualizing the topics; and Linkurious, a web front end for our users to search and visualize the graph database.
An Introduction to Container Organization with Docker Swarm, Kubernetes, Meso...Neo4j
Interest in Docker has increased significantly since its inception. According to a report compiled by a leading cloud-scale monitoring company, Datadog, two-thirds of the companies that try Docker adopt it, and the adopters have increased their container count by five times over a period of nine months. Neo4j has also embraced Docker by supporting official images and also offering specific images of its own.
While the interest in container technology is growing rapidly, so is the need to deploy containers over a cluster of machines to allow scalability and fault-tolerance. This highlights the need for orchestration which refers to the idea of automating the manual process of deploying, configuring and scaling the containers in an automated manner.
In this talk, we provide a hands-on introduction to the three most popular Docker orchestration tools: Kubernetes, Docker Swarm and Mesos. This talk offers a conceptual understanding of each of these technologies along with an insight into the concepts learned through a series of three demos. The demos will illustrate how to deploy and automatically scale a Neo4j container using each of the three orchestration platforms.
We realize that the scope of the topic in terms of the orchestration tools is too broad. The rationale behind choosing the three specific tools is based on the following two reasons: First is their potential use in our cluster at Cincinnati Children’s Hospital (CCHMC). Secondly, they also fall under the leading orchestration tools.
Importing Data into Neo4j quickly and easily - StackOverflowNeo4j
In this GraphConnect presentation Mark and Michael show several ways to import large amounts of highly connected data from different formats into Neo4j. Both Cypher's LOAD CSV as well as the bulk importer is demonstrated along with many tips.
We use the well know StackOverflow Q&A site data which is interestingly very graphy.
Graph Database Management Systems provide an effective
and efficient solution to data storage in current scenarios
where data are more and more connected, graph models are
widely used, and systems need to scale to large data sets.
In this framework, the conversion of the persistent layer of
an application from a relational to a graph data store can
be convenient but it is usually an hard task for database
administrators. In this paper we propose a methodology
to convert a relational to a graph database by exploiting
the schema and the constraints of the source. The approach
supports the translation of conjunctive SQL queries over the
source into graph traversal operations over the target. We
provide experimental results that show the feasibility of our
solution and the efficiency of query answering over the target
database.
Ready to leverage the power of a graph database to bring your application to the next level, but all the data is still stuck in a legacy relational database?
Fortunately, Neo4j offers several ways to quickly and efficiently import relational data into a suitable graph model. It's as simple as exporting the subset of the data you want to import and ingest it either with an initial loader in seconds or minutes or apply Cypher's power to put your relational data transactionally in the right places of your graph model.
In this webinar, Michael will also demonstrate a simple tool that can load relational data directly into Neo4j, automatically transforming it into a graph representation of your normalized entity-relationship model.
Big Graph Analytics on Neo4j with Apache SparkKenny Bastani
In this talk I will introduce you to a Docker container that provides you an easy way to do distributed graph processing using Apache Spark GraphX and a Neo4j graph database. You'll learn how to analyze big data graphs that are exported from Neo4j and consequently updated from the results of a Spark GraphX analysis. The types of analysis I will be talking about are PageRank, connected components, triangle counting, and community detection.
Database technologies have evolved to be able to store big data, but are largely inflexible. For complex graph data models stored in a relational database there may be tedious transformations and shuffling around of data to perform large scale analysis.
Fast and scalable analysis of big data has become a critical competitive advantage for companies. There are open source tools like Apache Hadoop and Apache Spark that are providing opportunities for companies to solve these big data problems in a scalable way. Platforms like these have become the foundation of the big data analysis movement.
Speakers
In this webinar we'll explore a data set using Neo4j and Cypher and compare the approach we might take with a relational database and SQL. We'll cover the following topics: Modeling the data set Importing the data Querying the data Evolving the model and queries as the data changes.
The F5 Networks Application Services Reference Architecture (White Paper)F5 Networks
Build elastic, flexible application delivery fabrics that are ready to meet the challenges of optimizing and securing applications in a constantly evolving environment.
The F5 Networks Application Services Reference Architecture (White Paper)
This whitepaper details the use of High Performance Computing HPC in Aerospace & Defense, Earth Sciences, Education And Research, Financial Services among others...
In 2010, transtec entered into a strategic partnership with IBM, surely one of the biggest players in the HPC world with a very strong brand. The flexibility and long-year experience of transtec, combined with the power and quality of IBM HPC systems constitute a perfect symbiosis and provide customers with the most optimal HPC solution imaginable. IBM iDataPlex systems are highly optimized for HPC workload in datacenter environments, regarding performance, flexibility, and energy, space and cooling efficiency.
Platform HPC and LSF are both enterprise-ready HPC cluster and workload management solutions and are widespread in all kinds of industrial HPC environments.
Your decision for a transtec HPC solution means you opt for most intensive customer care and best service in HPC. Our experts will be glad to bring in their expertise and support to assist you at any stage, from HPC design to daily cluster operations, to HPC Cloud Services.
Last but not least, transtec HPC Cloud Services provide customers with the possibility to have their jobs run on dynamically provided nodes in a dedicated datacenter, professionally managed and individually customizable. Numerous standard applications like ANSYS, LS-Dyna, OpenFOAM, as well as lots of codes like Gromacs, NAMD, VMD, and others are pre-installed, integrated into an enterprise ready cloud management environment, and ready to run.
Have fun reading the transtec HPC Compass 2013/14 IBM Special!
Business success increasingly depends on
the ability to apply new and innovative
business models and supporting IT solutions
more quickly than one's competitors can. In
short, it requires an agile enterprise. Recent
advancements in cloud computing can provide
an enterprise with the essential capabilities it
needs to become an agile enterprise
Cloud computing is the hottest topic in IT. It is virtually impossible to read a trade publication or
attend an IT conference and not be overwhelmed by discussions of the advantages and benefits
of cloud computing. In spite of all of the interest, there is still considerable confusion and
disagreement within the IT industry about the definition of cloud computing. The Cloud
Computing Journal, for example, published an article that included 21 definitions of cloud
computing. 1
Though there is confusion about the definition, the goal of cloud computing is quite clear – to
achieve an order of magnitude improvement in the cost-effective, elastic provisioning and
delivery of IT services.
Understanding the then and now of Enterprise Management Systems.pdfAnil
Enterprise Management Systems (EMS), also known as Enterprise Resource Planning (ERP) systems, have evolved significantly over the years. Understanding the "then" and "now" of EMS can provide insights into the transformation of business processes and technology.
Learn about IBM PureFlex System: The Future of Data center Management.The ‘Expert Integrated System’ delivers a combination of hardware, software and built-in expertise that makes implementing and applying the power of computing simpler, easier, faster and more effective than ever before.For more information, visit http://ibm.co/J7Zb1v.
Learn about IBM PureFlex System: The Future of Datacenter Management. the system has unique management capabilities of the IBM Flex System Manager. IBM’s goals in designing and building these systems were to return agility, efficiency, simplicity and control to data center operations. To know more, visit http://ibm.co/J7Zb1v.
Application Modernization With Cloud Native Approach_ An in-depth Guide.pdfbasilmph
Taking outdated applications and upgrading its platform infrastructure, internal
systems, and the way of using is known as application modernization. The
advantages of application modernization can be summarized as increasing the
speed with which new features are delivered, exposing the functionality of existing
applications to be consumed via API by other services, and re-platforming applications from on-premises to cloud-native application modernization.
The high-performance computing (HPC) systems
participate an significant responsibility in many highly
computational applications and systems. Understanding the
failure behavior of such a massively parallel system is essential to
accomplishing high utilization of large systems. This process
requires continuous on-line monitoring and analysis of all
incidents generated in the system, including long-term normal
notification, performance metrics, and failures. This article
illustrates the significance of HPC-Ss (HPC-S) and their fault
tolerance, especially rules-based systems. To explore the efficient
Fault-Tolerant (FT) mechanism and fault prediction method for
an efficient FTmechanism in distributed systems with different
rules. We also analyzed the progress of HPC in the rule-based
distributed system and its future development direction.
Similar to How the IBM Platform LSF Architecture Accelerates Technical Computing (20)
This IBM Redpaper provides a brief overview of OpenStack and a basic familiarity of its usage with the IBM XIV Storage System Gen3. The illustration scenario that is presented uses the OpenStack Folsom release implementation IaaS with Ubuntu Linux servers and the IBM Storage Driver for OpenStack. For more information on IBM Storage Systems, visit http://ibm.co/LIg7gk.
Visit http://bit.ly/KWh5Dx to 'Follow' the official Twitter handle of IBM India Smarter Computing.
Learn how all flash needs end to end Storage efficiency. For more information on IBM FlashSystem, visit http://ibm.co/10KodHl.
Visit http://bit.ly/KWh5Dx to 'Follow' the official Twitter handle of IBM India Smarter Computing.
Learn about vSphere Storage API for Array Integration on the IBM Storwize family. IBM Storwize V7000 Unified combines the block storage capabilities of Storwize V7000 with file storage capabilities into a single system for greater ease of management and efficiency. For more information on IBM Storage Systems, visit http://ibm.co/LIg7gk.
Visit http://bit.ly/KWh5Dx to 'Follow' the official Twitter handle of IBM India Smarter Computing.
Learn about IBM FlashSystem 840 and its complete product specification in this Redbook. FlashSystem 840 provides scalable performance for the most demanding enterprise class applications. IBM FlashSystem 840 accelerates response times with IBM MicroLatency to enable faster decision making. For more information on IBM FlashSystem, visit http://ibm.co/10KodHl.
Visit http://on.fb.me/LT4gdu to 'Like' the official Facebook page of IBM India Smarter Computing.
Learn about the IBM System x3250 M5,.The x3250 M5 offers the following energy-efficiency features to save energy, reduce operational costs, increase energy availability, and contribute to a green environment, energy-efficient planar components help lower operational costs. For more information on System x, visit http://ibm.co/Q7m3iQ.
http://www.scribd.com/doc/210746104/IBM-System-x3250-M5
This Redbook talks about the product specification of IBM NeXtScale nx360 M4. The NeXtScale nx360 M4 server provides a dense, flexible solution with a low total cost of ownership (TCO). The half-wide, dual-socket NeXtScale nx360 M4 server is designed for data centers that require high performance but are constrained by floor space. For more information on System x, visit http://ibm.co/Q7m3iQ.
http://www.scribd.com/doc/210745680/IBM-NeXtScale-nx360-M4
Learn about IBM System x3650 M4 HD which is a 2-socket 2U rack-optimized server. This powerful system is designed for your most important business applications and cloud
deployments. Outstanding RAS and high-efficiency design improve your business environment and help save operational costs. For more information on System x, visit http://ibm.co/Q7m3iQ.
Visit http://bit.ly/KWh5Dx to 'Follow' the official Twitter handle of IBM India Smarter Computing.
Here are the product specification for IBM System x3300 M4. This product can be managed remotely.The x3300 M4 server contains IBM IMM2, which provides advanced service-processor control, monitoring, and an alerting function. The IMM2 lights LEDs to help you diagnose the problem, records the error in the event log, and alerts you to the problem. For more information on System x, visit http://ibm.co/Q7m3iQ.
Visit http://on.fb.me/LT4gdu to 'Like' the official Facebook page of IBM India Smarter Computing.
Learn about IBM System x iDataPlex dx360 M4. IBM System x iDataPlex is an innovative data center solution that maximizes performance and optimizes energy and space efficiency. The iDataPlex solution provides customers with outstanding energy and cooling efficiency, multi-rack level manageability, complete flexibility in configuration, and minimal deployment effort. For more information on System x, visit http://ibm.co/Q7m3iQ.
http://www.scribd.com/doc/210744055/IBM-System-x-iDataPlex-dx360-M4
This Redbook talks through the benefits and product specification of IBM System x3500 M4. The x3500 M4 offers a flexible, scalable design and simple upgrade path to 32 HDDs, with up to eight PCIe 3.0 slots and up to 768 GB of memory. A high-performance dual-socket tower server, the IBM System x3500 M4, can deliver the scalability, reliable performance, and optimized efficiency for your business-critical applications. For more information on System x, visit http://ibm.co/Q7m3iQ.
http://www.scribd.com/doc/210742768/IBM-System-x3500-M4
Learn about system specification for IBM System x3550 M4. The x3550 M4 offers numerous features to boost performance, improve scalability, and reduce costs. Improves productivity by offering superior system performance with up to 12-core processors, up to 30 MB of L3 cache, and up to two 8 GT/s QPI interconnect links. For more information on System x, visit http://ibm.co/Q7m3iQ.
Learn about IBM System x3650 M4. The x3650 M4 is an outstanding 2U two-socket business-critical server, offering improved performance and pay-as-you grow flexibility along with new features that improve server management capability. For more information on System x, visit http://ibm.co/Q7m3iQ.
http://www.scribd.com/doc/210741926/IBM-System-x3650-M4
Learn about the product specification of IBM System x3500 M3. System x3500 M3 has an energy-efficient design which works in conjunction with the IMM to govern fan rotation based on the readings that it delivers. This saves money under normal conditions because the fans do not have to spin at high speed. For more information on System x, visit http://ibm.co/Q7m3iQ.
http://www.scribd.com/doc/210741626/IBM-System-x3500-M3
Learn about IBM System x3400 M3. The x3400 M3 offers numerous features to boost performance and reduce costs, x3400 M3 has the ability to grow with your application requirements with these features. Powerful systems management features simplify local and remote management of the x3400 M3. For more information on System x, visit http://ibm.co/Q7m3iQ.
Visit http://on.fb.me/LT4gdu to 'Like' the official Facebook page of IBM India Smarter Computing.
Learn about IBM System 3250 M3 which is a single-socket server that offers new levels of performance and flexibility
to help you respond quickly to changing business demands. Cost-effective and compact, it is well suited to small to mid-sized businesses, as well as large enterprises. For more information on System x, visit http://ibm.co/Q7m3iQ.
http://www.scribd.com/doc/210740347/IBM-System-x3250-M3
Learn about IBM System x3200 M3 and its specifications. The System x3200 M3 features easy installation and management with a rich set of options for hard disk drives and memory. The efficient design helps to save energy and provide a better work environment with less heat and noise. For more information on System x, visit http://ibm.co/Q7m3iQ.
http://www.scribd.com/doc/210739508/IBM-System-x3200-M3
Learn about the configuration of IBM PowerVC. IBM PowerVC is built on OpenStack that controls large pools of server, storage, and networking resources throughout a data center. IBM Power Virtualization Center provides security services that support a secure environment. Installation requires just 20 minutes to get a virtual machine up and running. For more information on Power Systems, visit http://ibm.co/Lx6hfc.
Visit http://on.fb.me/LT4gdu to 'Like' the official Facebook page of IBM India Smarter Computing.
Learn about Ibm POWER7 Virtualization Performance. PowerVM Lx86 is a cross-platform virtualization solution that enables the running of a wide range of x86 Linux applications on Power Systems platforms within a Linux on Power partition without modifications or recompilation of the workloads. For more information on Power Systems, visit http://ibm.co/Lx6hfc.
http://www.scribd.com/doc/210734237/A-Comparison-of-PowerVM-and-Vmware-Virtualization-Performance
Learn about IBM PureFlex Sytem and VMware vCloud Enterprise Suite. The IBM PureFlex System platform has been used to meet the hardware requirements in support of this reference architecture. All the components required to support vCloud Suite (including computing, networking, storage, and management interfaces). For more information on Pure Systems, visit http://ibm.co/J7Zb1v.
http://www.scribd.com/doc/210719868/IBM-pureflex-system-and-vmware-vcloud-enterprise-suite-reference-architecture
Learn how x6: The sixth generation of EXA Technology is fast, agile and Resilient for Emerging Workloads from Alex Yost. Vice President, IBM PureSystems and System x
IBM Systems and Technology Group. x6 drives cloud and big data for enterprises by achieving insight faster thereby outperforming competitors. For more information on System x, visit http://ibm.co/Q7m3iQ.
http://www.scribd.com/doc/210715795/X6-The-sixth-generation-of-EXA-Technology
How the IBM Platform LSF Architecture Accelerates Technical Computing
1. 1
Executive Summary
Advances in High Performance Computing (HPC) have resulted in dramatic improvements in
application processing performance across a wide range of disciplines that range from
manufacturing, finance, geological, life and earth sciences and many more. This mainstreaming of
HPC has driven solution providers towards innovative Technical Computing solutions that are faster,
scalable, reliable, and secure.
Today, these mission critical technical computing clusters are challenged with reducing cost and
managing complexity. Besides cost and complexity, data explosion in technical computing has
transformed compute-intensive application workloads to both compute and data-intensive. There
continues to be an unrelenting appetite to solve newer problems that are larger and even more
complex. This is straining technical computing environments beyond current limits. While today’s
technical computing application demands are growing, there are newer applications across several
domains that now demand HPC scale solutions. These newer business problems include fraud
detection, anti-terrorist analysis, social and biological network analysis, semantic analysis, drug
discovery and epidemiology, weather and climate modeling, oil exploration, and power grid
management1
.
Although most technical computing environments are quite sophisticated, many IT organizations
cannot fully utilize the available processing capacity in order to address newer business needs
adequately. For these organizations, effective resource management and job submission is an
extremely complex process that needs to meet stringent service level agreement (SLA) requirements
across multiple departments. This demands higher levels of shared infrastructure utilization and
better application processing throughput, while keeping costs lower. It is hard to optimize the
execution of a wide range of applications using clusters and ensure high resource utilization given
diverse workloads, business priorities and application resource needs.
To address these complex technical computing needs, IBM®
Platform™
LSF®
is successfully deployed
across many industries and is continuously evolving to address contemporary needs. The flagship
product of the IBM Platform Computing portfolio, IBM Platform LSF provides comprehensive,
intelligent, policy-driven scheduling features that enable users to fully utilize all their IT
infrastructure resources while ensuring optimal application performance.
This whitepaper describes key architectural aspects of IBM Platform LSF including its use model,
scheduling architecture, other core components and installation architecture. It highlights the
product’s architectural strengths that help address current business challenges by optimizing the use
of shared HPC resources. The target audience includes chief technical officers (CTOs), technical
evaluators and purchase decision makers, who need to understand the architectural capabilities of
LSF, and relate them to business benefits such as containing operational and infrastructure costs
while increasing scale, utilization, productivity and resource sharing in technical computing
environments.
1
Big Data in HPC – Back to the future http://blogs.amd.com/work/2011/04/13/big-data-in-hpc-back-to-the-future/
How the IBM Platform LSF Architecture Accelerates Technical Computing
Sponsored by IBM
Srini Chari, Ph.D., MBA
October, 2012
mailto:chari@cabotpartners.com
CabotPartnersGroup,Inc.100WoodcrestLane,DanburyCT06810,www.cabotpartners.com
Cabot
PartnersOptimizing Business Value
2. 2
Introduction – Tuning Technical Computing Tasks
Advances in HPC and technical computing have resulted in dramatic improvements in application
processing performance across a wide range of disciplines. Although most technical computing
environments are quite sophisticated, many IT organizations find it challenging to maximize
productivity with available processing capacity and meet newer business needs adequately.
Today, HPC clusters typically consist of hundreds or thousands of compute servers, storage and
network interconnect components. These require substantial investment and drive up capital,
personnel and operating costs. For maximum Return on Investment (ROI), these technical computing
environments must be shared across several users and departments within an organization. The ever
increasing computing demands in a continuously growing compute cluster requires fair sharing and
effective utilization of raw clustered compute capability. Sharing is made possible through
intelligent workload and resource management that includes job scheduling and fine grained control
over shared resources. Effective workload and resource management boosts cluster resource
utilization and Quality of Service (QoS) necessary for meeting business priorities and SLAs.
Technical compute cluster owners need to manage their existing deployed applications and also plan
for new business and application requirements. Maximizing throughput2
and maintaining optimal
application performance are primary challenges that are hard to address simultaneously. High
throughput requires elimination of load imbalance among constituent compute nodes in a cluster.
Optimal application performance necessitates reduction in communication overhead by
appropriately mapping application workload to the best available compute resources in the cluster.
Such needs are addressed by workload management solutions that typically consist of a resource
manager and a job scheduler. Together, these prevent jobs from competing with each other for
limited shared resources in large clusters.
IBM Platform LSF is a powerful and comprehensive technical computing workload management
platform that supports diverse workloads, across several industry verticals, on a computationally
distributed system. It has proven capabilities such as the ability to scale to thousands of nodes, built-
in high availability, intelligent job scheduling and sophisticated yet simple-to-use resource
management capabilities that improve management of shared clusters. Features such as effective
monitoring and fine-grained control over workload scheduling policies are well suited for multiple
lines of business users within an organization. By maximizing heterogeneous shared resources in a
shared computing environment, LSF ensures that resource allocation is always aligned with business
priorities. System utilization and QoS improve as job throughput and application performance is
maximized. This reduces cycle times and maximizes productivity in mission critical HPC
environments.
This whitepaper covers key aspects of the IBM Platform LSF architecture and how this architecture
is optimized to address technical computing challenges. Highlights include key architectural aspects
of IBM Platform LSF including its use model, scheduling architecture, other core components and
installation architecture that together help optimize the use of shared resources. This paper aims to
empower CTOs, technical evaluators and purchase decision makers with a perspective on how the
architectural capabilities of LSF are well equipped to address today’s HPC challenges specific to
their business. Also included are the latest LSF features and benefits and how these help in
containing operational and infrastructure costs while increasing scale, utilization, productivity and
resource sharing in technical computing organizations.
2
Throughput – number of jobs completed per unit of time
Technical
computing
environments
challenged to
maximize
productivity
Intelligent
workload and
resource
management
are needed to
maximize ROI
and guarantee
stringent SLAs
IBM Platform
LSF
intelligently
schedules and
guarantees
completion of
workloads
across a
distributed,
heterogeneous,
virtualized IT
environment
3. 3
The IBM Platform LSF Architecture
IBM Platform LSF provides resource-aware scheduling through its highly scalable and reliable
architecture with built-in availability features. It has a comprehensive set of intelligent, policy-driven
scheduling capabilities that enable full utilization of distributed cluster compute resources. The LSF
architecture is geared to address technical computing challenges faced by users as well as
administrators. Together with IBM Platform Application Center, LSF allows users to schedule
complex workloads through easy to use interfaces. With LSF, administrators can easily manage
shared cluster resources up to petaflop-scale while increasing application throughput, maintaining
optimum performance levels, and QoS that is consistent with business requirements and priorities.
Its modular architecture is unique and provides both higher scalability and flexibility by clearly
separating the key elements of job scheduling and resource management that are critical for HPC
workload management needs. These key elements are:
Task Placement Policies that govern exchange of load information within cluster nodes and are
used in decision making for task placement on cluster nodes
Mechanisms for transparent remote execution of scheduled jobs
Interfaces that support load sharing applications, and
Performance optimization of highly scalable HPC applications.
The following sections highlight the how LSF works, how users access its key features, the LSF
scheduling architecture and its other core elements. Then, we briefly describe the installation
architecture indicating where each LSF component is active within a cluster and how it helps in job
scheduling and resource management tasks.
LSF Cluster Use Model
This section describes how a typical IBM Platform LSF cluster is accessed and used. Individual
compute resources in a technical computing organization are usually grouped into one or more
clusters that are managed by LSF. Figure 1 shows this cluster use model, and how the job
management and the resource management roles are played by different nodes in a LSF cluster. One
machine in the cluster is selected by LSF as the “master” node or master host. The master node plays
a key role in resource management and job scheduling functions of workload management. The
other nodes in the cluster act as slave nodes and can be harnessed by the scheduler, through its
scheduling algorithms, for executing jobs.
Master Nodes: When nodes start up, LSF uses intelligent, fault-tolerant algorithms for master node
selection. During system operation, if the master node fails, LSF ensures that another node takes the
place of the master, thus keeping the master node highly available and system services accessible to
users at all times. Job scheduling decisions are governed by business priorities and policies that are
set up by the LSF system administrator.
Figure 1: LSF cluster use model (source: IBM)
Technical
computing
environments
challenged to
maximize
productivity
Intelligent
workload and
resource
management
are needed to
maximize ROI
and guarantee
stringent SLAs.
Platform LSF
intelligently
schedules and
guarantees
completion of
workloads
across a
distributed,
heterogeneous,
virtualized IT
environment
The modular
IBM Platform
LSF architecture
provides users
and
administrators
better flexibility
and scalability
with separation
of scheduling
and resource
management
elements
An intelligent
master-slave
model for
scheduling and
management
improves
reliability and
performance
4. 4
Users connect to a distributed system via a client and submit their jobs to the job submission node.
As these user jobs queue up, the master decides where to dispatch the job for execution, based on the
resources required and current availability of the resources among slave nodes.
Slave Nodes: Each slave machine in the system collects its own “vital signs” or the load information
periodically and reports them back to the master. Detailed information on the load index3
for each
node in the distributed system is analyzed and used for scheduling decisions in order to reduce job
turnaround time and increase system throughput. LSF has unique algorithms for smart information
dissemination of the load index and resource usage status to optimize system scalability and
reliability. These algorithms are proven to scale up to thousands of nodes.
Workload Execution: LSF has a remote execution component that starts or stops the jobs on the
assigned slave node. Once the scheduled jobs complete on slave nodes, the completion results and
job status are communicated to the user. LSF also generates reports on resource usage and detailed
job execution logs. Users can obtain job execution results on a local node, transparently, as if they
were executing those jobs locally. LSF frees users from having to decide which nodes are best for
executing a job while allowing administrators to set up policies for job execution logic that are best
suited to business needs.
There are options to checkpoint a job that is running on a slave node, or move a running job to a
different slave node and then resume execution. This feature can help to temporarily suspend
running jobs, free up resources for any critical jobs, and then resume jobs from the last execution
point instead of having to restart them all over, thus improving system flexibility and utilization.
LSF Scheduler
Scheduling is a key component of any workload and resource management solution. Figure 2 shows
the central component of the LSF scheduling architecture, which provides support for multiple
scheduling policies. When a job is submitted to LSF, many factors control when and where the job
starts to run. These factors include the active time window of the queues or hosts, resource
requirements of the job, availability of eligible hosts, various job slot limits, job dependency
conditions, fair-share constraints and load conditions.
3
Load Index: LSF defines a load -index for each type of resource. Load index quantifies each node’s loading condition. Depending on the nature of the
resource, some possibilities are queue length, utilization, or the amount of free resource. Reference: Utopia – a load sharing facility for a large scale
heterogeneous system
http://cse.unl.edu/~lwang/project/Utopia_A%20Load%20Sharing%20Facility%20for%20Large,%20Heterogeneous%20Distributed%20Computer%20Syst
ems.pdf
Figure 2: LSF scheduling architecture (source: IBM)
Smart
scheduling
algorithms
reduce time to
results and
maximize
throughput
while
improving
reliability
The LSF
scheduler
supports
multiple
policies
aligned with
business needs
5. 5
One unique architectural feature of the LSF scheduler is that it allows multiple scheduling policies to
coexist in the same system. This means that to make scheduling decisions, LSF accommodates
multiple scheduling approaches that can run concurrently and be used in any combination, including
user-defined custom scheduling approaches. The LSF scheduler plug-in API can be used to
customize existing scheduling policies or implement new ones that can operate with existing LSF
scheduler plug-in modules. These custom scheduling policies can influence, modify, or override LSF
scheduling decisions, thus empowering administrators to model the job scheduling decisions aligned
with business priorities. The scheduler plug-in architecture is fully external and modular; new
scheduling policies can be prototyped and deployed without changing the compiled code of LSF.
LSF Core Components
LSF takes job requirements as inputs, finds the best resources to run the job, schedules and executes
jobs and monitors its progress. Jobs always run according to host load and site policies. This section
provides an overview of some of the core components of LSF and their key role in job scheduling
and resource management functions. LSF is a layer of software services on top of UNIX and
Windows operating systems that creates a single pool of networked compute and storage resources.
This layered service model (Figure 3) provides a resource management framework to allocate,
manage and use resources as a single entity. The three basic components of this layer are LSF Base,
LSF Batch and LSF Libraries and together they help in distributing work across existing
heterogeneous IT resources; creating a shared, scalable, and fault-tolerant infrastructure that delivers
faster and more reliable workload performance.
LSF Base provides basic load-sharing services for the distributed system such as resource usage
information, host selection, job placement advice, transparent remote execution of jobs and remote
file options. These services are provided through the following sub-components:
Load Information Manager (LIM)
Process Information Manager (PIM)
Remote Execution Server (RES)
LSF Base application programming interface (API)
Utilities such as lstools, lstcsh and lsmake.
LSF Batch extends LSF base services to provide a batch job processing system along with load
balancing and policy-driven resource allocation control. To provide this functionality, LSF Batch
uses the following LSF base services:
Figure 3: LSF services - high level architecture (source: IBM)
The LSF
scheduler
minimizes
latencies for
short jobs
while
improving
performance
for long jobs
LSF core
components
help in
distributing
work across
existing
heterogeneous
IT resources;
creating a
shared,
scalable, and
fault-tolerant
infrastructure
6. 6
Resource and load information from LIM to perform load balancing activities
Cluster configuration information and master LIM election service from LIM
RES for interactive batch job execution
Remote file operation service provided by RES for file transfer.
LSF Libraries provide APIs for distributed computing application developers to access job
scheduling and resource management functions. There are two LSF libraries: LSLIB and LSBLIB.
LSLIB is the core library that provides basic workload management services to applications
across a shared cluster and is a runtime library to easily develop load sharing applications.
LSLIB implements a high level procedural interface that allows applications to interact with
LIM and RES. The other library, LSBLIB, is the batch library and it provides batch services
that are required to submit, control, manipulate, and queue jobs on system nodes.
LSF Installation Architecture
LSF consists of a number of servers or daemon processes that run with root privileges on each
participating host (Figure 4) in the system and a comprehensive set of utilities that are built on top of
the LSF API. There are multiple LSF processes running on each host in the distributed system. The
type and number of processes running depend on whether the host is a master host, a compute or
slave host or one of the master node candidates as shown in Figure 5.
LSF libraries
provide APIs
for application
developers to
access job
scheduling and
resource
management
functionality of
LSF.
LSF consists of
a number of
servers or
daemon
processes that
run with root
privileges on
each
participating
host
Figure 4: LSF daemons and their functions in scheduling & resource management (source: IBM)
7. 7
On each participating host in a LSF cluster, an instance of LIM runs and exchanges load information
with its peers on other hosts and provides applications and associated tasks with a list of hosts that
are best for execution. Multiple resources on each host and resource demands of each application are
considered in LIM placement decisions. In addition to help LSF make placement decisions, LIM
also provides load information to those applications that make their own placement decisions.
Besides LIM, RES is another server or daemon on each host. RES provides the mechanisms for
transparent remote execution of arbitrary tasks. Typically, after placement advice has been obtained
from LIM, a stream connection is established between the local application and its remote task
through RES on the target host. This is followed by remote task initiation. LSF supports several
models of remote execution to meet the diverse functional and performance requirements of
applications. A LIM and a RES run on every Platform LSF server host. They interface with the
host’s operating system to give users a uniform, host-independent environment. Figure 6 shows
sample job submission steps, for regular as well as batch jobs that run on a LSF system and various
interactions between LSF components during job submission and execution.
LSF Architectural Strengths
The architectural strength of LSF results from its modular structure that even allows parts of the
system to be used independent of other parts. For instance, a task can be executed on a remote host
specified by the user so that LSLIB can contact the remote RES component, without needing the
LIM component. Similarly, load information and placement advice from LIM may be obtained for
Figure 5: Installation architecture with various LSF processes running on different nodes in a LSF managed cluster (source: IBM)
Figure 6: Interactions between various LSF components during job submission and execution (source: IBM)
LSF supports
several models
of remote
execution to
meet the
diverse
functional and
performance
requirements
of applications
The LSF
modular
structure even
allows parts of
the system to
be used
independent of
other parts
8. 8
purposes other than remote execution. Another advantage of the LSF architecture is that policies and
mechanisms of load sharing may be changed independent of each other as well as independent of the
applications running on the system. This provides significant fine grain control over resource sharing
and job scheduling.
While LSF manages distributed system sharing and job scheduling complexities with its smart
architecture, it also provides easy-to-use and simple interfaces that improve productivity of both
users and administrators and boosts collaboration in technical computing organizations. The highly
available single master node concept for managing an entire cluster simplifies distributed systems
management and frees up domain experts to focus on value added work instead of the tedious job
scheduling and system management tasks. At higher scale, LSF deploys a hierarchical master node
concept internally but all that complexity is hidden and does not impact its simplified use model.
Users can access systems with thousands of nodes that could be spread across geographies through
additional LSF components such as LSF Multi-Cluster. LSF is architected to run on a variety of x86
hardware and operating environments including the latest generation of IBM System x servers and is
also certified on IBM Power Systems servers running the AIX and Linux operating systems.
IBM Platform LSF Benefits
LSF allows multiple users to share heterogeneous assets more effectively in a shared computing
environment.
Consequently, people are more
productive, projects are completed
earlier and because computer
utilization is better, infrastructure
costs are contained.
By consolidating compute resources
from multiple, distributed systems,
workload can be distributed more
efficiently across an organization’s
technical computing assets that are
geographically dispersed. With this
capability, effective sharing of
resources can be extended from a
single cluster to enable flexible
hierarchical or peer-to-peer workload
distributions between multiple clusters.
LSF improves efficiency by removing the problem of underutilized compute resources by enabling
local administrators to retain control of their own assets while still permitting remote systems to tap
into idle capacity.
Cluster-level capabilities in LSF transparently extend to the grid. This makes it exceptionally fast and
cost-efficient to deploy on grids, eliminating the need for sites to implement an expensive,
customized scheduling layer to share resources between clusters.
With simple interfaces and a plug-in modular architecture, LSF lowers the learning curve and
increases cluster user productivity, reduces application integration and training costs, and speeds up
job completion by eliminating manual job submission errors through automation. Technical
computing users obtain faster results and complete more jobs using shared cluster resources at lower
costs.
IBM Platform
LSF: Complete,
powerful,
scalable
Workload
Management
Solution.
Benefits:
Advanced,
feature-rich
workload
scheduling
Robust set of
add-on
features
Integrated
application
support
Policy &
Resource
aware
scheduling
Resource
consolidation
for maximum
performance
Automation &
Advanced self
management
Thousands of
concurrent
users & jobs
Optimal
utilization, less
infrastructure
costs
Better user
productivity,
faster time to
results
Best TCO –
flexible control,
multiple
policies, robust
capabilities,
administrator
productivity
The LSF
smarter
scheduling
advantages:
Higher
throughput at a
lower cost
Flexibility to
address
changing
business needs
Better asset
utilization &
ROI
Better service
levels to end-
users
Increased
automation &
reduced
manual
intervention
9. 9
In short, LSF equips technical computing environments to achieve the following benefits:
Obtain higher-quality results faster
Reduce infrastructure and management costs, and
Easily adapt to changing user requirements.
Conclusions
Flexibility, scalability and agility are the key requirements of technical computing environments4
.
Technical computing users typically run varied applications and workloads on clusters and large
scale distributed systems. These workloads range from performance sensitive, compute-intensive,
data-intensive or a combination.
To support large technical computing clusters, customers are challenged with manual tasks and
cumbersome tools, issues related to integration and the need for multiple dedicated personnel to
develop and maintain custom integration between various tools and applications. This increases costs
and business risks because a lot of the mission-critical functionality could be expensive or time-
consuming to realize. Instead of focusing on core high-value tasks, administrators could also be
consumed by mundane manual systems management tasks. These environments demand reliability as
well as scalability from the underlying IT infrastructure. However, budgetary constraints and
competitive pressures make it imperative to increase resource utilization and improve infrastructure
sharing efficiencies to achieve better collaboration, productivity and faster time to results.
In such large scale distributed systems, computing resources are made available to users through
dynamic and transparent load sharing provided by IBM Platform LSF. Through its transparent
remote job execution, LSF harnesses powerful remote hosts to improve application performance,
enabling users to access resources from anywhere in the system. The IBM Platform LSF product
family has the broadest set of capabilities in the industry which are tightly integrated and fully
supported by IBM. As part of an even broader portfolio of offerings from IBM and IBM Business
Partners, LSF can be packaged with more engineering, integration and process capabilities. This
further enhances productivity of technical computing users, enabling them to focus more on their
core business, engineering or scientific tasks. It also reduces future strategic risk as the business
evolves.
The IBM Platform LSF architecture is geared to create a scalable, reliable, highly utilized and
manageable shared infrastructure for technical computing environments with powerful resource
management and scheduling solutions cutting across cluster silos. Its modular architecture provides
the much needed flexibility and fine-grained control while speeding up job turnaround times and
improving productivity. Simple interfaces and easy customization features of LSF and
complementary products reduce complexity and management costs; facilitate better collaboration,
tighter integration and alignment of scheduling and resource management tasks with business
objectives and priorities. LSF is architected to optimally place workloads not only based on the
capability of a cluster machine to run a workload, but based on a determination of what host is best
able to run the workload while ensuring broader business policies and requirements are met.
IBM Platform LSF lowers operating costs by smartly matching the limited supply of shared
resources with application demands and business priorities through features such as guaranteed
resources, live re-configuration, fair-share and pre-emptive scheduling enhancements, better
performance and scalability. IBM continues to enhance the capabilities of LSF and LSF-add on
components. Clients can expect IBM to deliver capabilities to deploy new LSF add on components
on demand to keep up with ever changing requirements of the technical computing marketplace.
4
Trends from the trenches: Bio IT World 2012 http://www.slideshare.net/chrisdag/2012-trends-from-the-trenches
Technical
computing
organizations
need
flexibility,
scalability,
and agility at
lowers costs
and risks
LSF can be
packaged with
engineering,
integration
and processes
so that
technical
computing
organizations
can become
more
productive and
focus on their
core business,
engineering or
scientific tasks
LSF lowers IT
costs smartly
by matching
the limited
supply of
shared
resources with
application
demands and
business
priorities
LSF is tuned to
technical
computing now
and in the
future
10. 10
Appendix: What’s new in LSF Version 9.1?
IBM Platform LSF virtualizes heterogeneous IT infrastructure and offers customers complete freedom of
choice. Through fully integrated and certified applications, custom application integration, support for a wide
variety of operating systems, it ensures that current investments are preserved while providing the strategic
benefit of freedom of choice to run the best platform for the best job. The current LSF (Version 9.1) release
delivers improvements in performance and scalability over prior versions while introducing several additional
new features that simplify administration and boost productivity of cluster users.
New Features
in LSF
Functional details Business Benefits
Performance
and scalability
Improved Query Response ~10ms, decrease in Scheduling cycle, memory
optimization, decrease start/restart time, parallel start-up/restart.
LSF has been extended to support an unparalleled scale of up to 160,000 cores
and two million queued jobs for very high throughput EDA workloads.
On very large clusters with large numbers of user groups employing fair-share
scheduling, the memory footprint of the master batch scheduler in LSF has
been reduced by approximately 70% and scheduler cycle time has been
reduced by 25%.
Faster job turnaround times. Faster
time to results.
For a very large fair-share tree (e.g.
4K user group, 500 users with -g), job
election performance has been
improved 10x.
Better usability
&
manageability
Clearer reporting of resource usage and pending reasons.
Better alternative job resource options for timely job execution
Enhanced process tracking: LSF 9.1 leverages kernel cgroup functionality to
replace/improve existing functionality for Process Tracking and Topology
CPU/memory enforcement.
Fast detection of hung hosts/jobs, directory management. New multi-threaded
communication mechanism allows faster detection of unavailable hosts.
Speeds up troubleshooting, faster
detection of failed or hung jobs, self-
tuning, and better admin productivity.
Protection against user initiated
actions that can result in denial of
service.
Timely job turnaround with alternate
resources
Scheduling
enhancements
LSF 8 provided guaranteed resource scheduling feature for groups of jobs
based on slots (cores), LSF 9.1 extended this feature for more complex
resource guarantees to support multi-dimensional packages. A package is a
combination of slots and memory. This enables SLA scheduling to consider
memory in addition to cores.
Besides numerous multi-cluster scheduling enhancements such as enhanced
interoperability across clusters and exchange of all load information between all
clusters, it also provides CPU and memory affinity
LSF 9.1 also provides alternative or time based resource requirements to better
align with business priorities with a much finer-grained control.
LSF scheduling enhancements make
the cluster more stable and reliable
Better job control and more accurate
light weight CPU and memory
accounting even for run away and
short job processes.
Fine-grained tuning and customization
of infrastructure sharing policies
ensure flexibility and agility in
resource sharing that match closely
with evolving business requirements.
IBM Platform -
Advanced
Edition
Architecture
The new LSF - Advanced Edition architecture separates user interaction from
scheduling, and divides the compute resource into a number of execution
clusters, while presenting it to the users as a single cluster.
This new architecture delivers the
expected increase in performance with
the increase in capacity resulting in
consistent user experience with scale.
LSF Add-on modules have also been enhanced in the latest version 9.1. The LSF License Scheduler
handles parallel jobs where each rank checks out a license directly, more efficiently and does not
need a restart for making configuration changes. There are enhanced filtering and drill down
capabilities in IBM Platform RTM along with support for IBM General Parallel File System (GPFS)
monitoring. LSF Process Manager now supports non-LSF batch systems and the IBM Platform
Symphony product. IBM Platform Application Center has improved the interface with IBM Platform
Analytics and the latter now supports Tableau (v8) and Vertica (5.1) and latest BI reporting
capabilities.
IBM Platform LSF V9.1 delivers significantly enhanced performance, scalability, manageability and
usability as well as new scheduling capabilities. The new Platform LSF – Advanced Edition
provides greater than three times more scalability than prior versions of LSF, enabling clients to
consolidate their compute resources to achieve maximum flexibility and utilization.
For clients looking to improve service levels and utilization with a dynamic, shared HPC cloud
environment, IBM Platform Dynamic Cluster V9.1 is now available as an add-on to IBM Platform
LSF. Platform Dynamic Cluster turns static Platform LSF clusters into dynamic, shared cloud
infrastructure. By automatically changing the composition of clusters to meet ever-changing
workload demands, service levels are improved and organizations can do more work with less
infrastructure. With smart policies and numerous features such as live job migration and checkpoint-
restart, Platform Dynamic Cluster enables clients to realize improved utilization, better reliability,
and increased productivity, while reducing administrator workload.
11. 11
The new IBM Platform Session Scheduler V9.1 is designed to work with Platform LSF to provide
high throughput, low-latency scheduling for a wide-range of workloads. It is particularly well suited
to environments that run high-volumes of short duration jobs, and where users require faster and more
predictable job turnaround times. Unlike traditional batch schedulers that make resource allocation
decisions for every job submission, Platform Session Scheduler enables users to specify resource
allocation decisions only once for multiple jobs in a user session, providing users with their own
virtual private cluster. With this more efficient scheduling model, users benefit from higher job
throughput and faster response times while cluster administrators realize an overall improvement in
cluster utilization.
To learn more about current IBM Platform LSF product features, visit:
http://www-03.ibm.com/systems/technicalcomputing/platformcomputing/products/lsf/index.html
Copyright ®
2012. Cabot Partners Group. Inc. All rights reserved. Other companies’ product names, trademarks, or service marks are used herein for identification only and belong to their
respective owner. All images and supporting data were obtained from IBM or from public sources. The information and product recommendations made by the Cabot Partners Group are
based upon public information and sources and may also include personal opinions both of the Cabot Partners Group and others, all of which we believe to be accurate and reliable.
However, as market conditions change and not within our control, the information and recommendations are made without warranty of any kind. The Cabot Partners Group, Inc. assumes
no responsibility or liability for any damages whatsoever (including incidental, consequential or otherwise), caused by your use of, or reliance upon, the information and
recommendations presented herein, nor for any inadvertent errors which may appear in this document. This document was developed with IBM funding. Although the document may
utilize publicly available material from various vendors, including IBM, it does not necessarily reflect the positions of such vendors on the issues addressed in this document.