5. Table of Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
About This Book......................................................................... 2
Icons Used in This Book............................................................. 2
Beyond the Book......................................................................... 3
Chapter 1: Getting to Know Big Data and Analytics. . . . 5
Looking at Big Data and Analytics............................................ 6
Volume............................................................................... 6
Velocity.............................................................................. 7
Variety................................................................................ 7
Examining Types of Analytics................................................... 8
Understanding the Business Need for BD&A.......................... 8
Competitive advantage.................................................... 9
Enhanced ROI.................................................................. 10
Improved customer experience.................................... 10
Introducing BD&A Infrastructure............................................ 10
Promoting success......................................................... 11
An approach you can follow......................................... 12
Chapter 2: Looking at Infrastructure for Big Data
and Analy tics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Looking at a BD&A Hardware Infrastructure........................ 15
Customer business goals............................................... 16
Needs analysis................................................................ 16
Introducing the Infrastructure Components of BD&A......... 16
Scalability of infrastructure........................................... 17
Parallel processing......................................................... 17
Low latency resources................................................... 18
Data optimization........................................................... 18
Understanding the Aspects of a Well-Architected
Infrastructure........................................................................ 20
Speed/performance........................................................ 20
Availability....................................................................... 20
Access.............................................................................. 21
Chapter 3: Looking at IBM Infrastructure Choices. . . . 23
IBM Servers................................................................................ 23
IBM Power Systems........................................................ 24
IBM System z................................................................... 25
IBM PureSystems............................................................ 25
6. Big Data Analytics Infrastructure For Dummiesiv
IBM Storage................................................................................ 26
Looking at storage basics.............................................. 26
IBM FlashSystem............................................................. 27
IBM System Software................................................................ 28
IBM Elastic Storage......................................................... 29
Platform Computing....................................................... 29
Chapter 4: Solving Big Problems with Powerful
Solutions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Energy and Utilities.................................................................. 31
Business Need................................................................. 32
Solution............................................................................ 32
Fashion....................................................................................... 32
Travel......................................................................................... 33
Finding the solution........................................................ 34
Looking at the benefits.................................................. 34
Responding to customer behavior............................... 34
Healthcare.................................................................................. 35
Insurance................................................................................... 35
Overview.......................................................................... 36
Solution............................................................................ 36
Travel and Transportation...................................................... 36
Business need................................................................. 37
Solution............................................................................ 37
Chapter 5: Ten Questions LoB and IT Execs
Should Ask Each Other . . . . . . . . . . . . . . . . . . . . . . . . . 39
IT to LoB: Are You Aware of the Benefits of BD&A
Infrastructure?....................................................................... 40
IT to LoB: Are You Aware of BD&A’s Return On
Investment?............................................................................ 40
LoB to IT: How Can I Benefit from Bringing the
Back Office to the Front Office?........................................... 41
LoB to IT: Do I Need to Be On the Cloud?.............................. 41
LoB to IT: Should I Worry about Obsolescence?.................. 41
IT to LoB: Where and When Would Performance
Issues Occur?......................................................................... 42
IT to LoB: Are There QoS or SLA Concerns?......................... 42
IT to LoB: What Security Requirements Exist?..................... 42
IT to LOB: Does the BD&A Application Require
Mission-Critical Resiliency?................................................. 42
IT to LoB: How Will the BD&A Application Expand?............ 43
29. Chapter 3
LookingatIBM
InfrastructureChoices
In This Chapter
▶ Seeing a world of infrastructure options
▶ Examining the storage universe
Apowerful infrastructure goes a long way toward improving
success in implementing Big Data and Analytics (BD&A).
The alternative is failure, which usually isn’t an option for enter-
prises. Chapter 2 discusses infrastructure capabilities in a gen-
eral way, but this chapter gives you a look at the IBM families of
server, storage, and system software products.
Many products from various sources appear at first to meet
the BD&A challenge, but that’s not the whole story. You’re
better served by using products that have superior features
designed into them. So, in addition to brute force processing/
storage power, look for products with intuitive management
that require a minimum of manual intervention and result in
reduced administrator time.
IBM Servers
You may want to become familiar with key IBM product line
names. They’re the core of a BD&A infrastructure and fall into
the worlds of processing and storage.
In all cases, infrastructure needs for BD&A meet six critical
business needs. Examples are acquire, grow, and retain custom-
ers; transform financial management processes; optimize oper-
ations and reduce fraud; manage risk; improve IT economics;
30. Big Data Analytics Infrastructure For Dummies24
and create new business models. For each entry point, certain
infrastructure design points (speed, access, availability) are
more important than others. Certain design points are better
enabled by specific hardware and software infrastructure capa-
bilities and architectures.
Some deployments are more aligned with scale-out server
solutions, while others are used with scale-up solutions:
✓ Scaling out (scaling horizontally) means adding more
nodes to a system, such as adding a new computer to a
distributed software application. Hundreds of small com-
puters may be configured in a cluster to obtain aggregate
computing power that often exceeds that of computers
based on a single traditional processor.
✓ Scaling up (scaling vertically) means adding resources
to a single node in a system. It typically involves adding
CPUs or memory to a single computer. Such vertical scal-
ing of existing systems also enables them to use virtual-
ization technology more effectively, as it provides more
resources for the hosted set of operating system and
application modules to share.
Choose a scale-in, scale-up, or scale-out server infrastructure to
support the complexity and breadth of analytic workloads. IBM
provides an integrated, high-performance infrastructure includ-
ing core server, storage, networking, and systems software tech-
nology. You can accelerate the flow of data and insights, provide
shared and highly secure access to all types of data where it
resides, and significantly improve right-time availability. The
core systems of the BD&A infrastructure are covered in this
section.
IBM Power Systems
IBM Power Systems are highly virtualized, based on IBM next-
generation POWER8 multicore processors. Designed around
the world’s first open server ecosystem, they bring together
the computing power, memory bandwidth, and I/O in ways that
are optimized for innovative data hungry applications. Power
System’s support Linux open standards, as well as the industry
leading AIX UNIX. The platform enables choice in design and
deployment to meet a wide range of data center requirements.
31. Chapter 3: Looking at IBM Infrastructure Choices 25
There are multiple deployment styles for business analytics,
including server nodes for business analytics grids and scal-
able server nodes to analyze data stored in data warehouses.
The proximity of data stored near (or on) these systems
enhances overall throughput, as do fast server I/O links con-
necting to external storage and network switches.
These capabilities have made Power Systems the platform
that has driven innovative BD&A solutions, such as the IBM
Watson solution that’s now commercially available for clients
looking to take their analytics to the next level by leveraging
cognitive computing.
IBM System z
IBM’s mainframe is System z. Much of the world’s transactional
and financial data still reside on mainframe systems, and the
System z BD&A solutions are centered on the concept of moving
analytics processing closer to the data, integrating and manag-
ing the full variety, velocity, and volume of data that comprises
advanced analytics business solutions. This approach relies on
the structured data already residing on System z but also incor-
porates feeds from unstructured and streaming data sources.
IBM System z mainframes deliver the ability to easily handle mas-
sive data volumes while also scaling quickly and cost-effectively
as needs arise — efficiently protecting and governing data and
delivering leading-edge computing performance.
IBM System z is highly virtualized through logical partitioning
(LPARs) and supports several operating systems, including
System z z/OS environment and Linux distributions running
on System z Integrated Facility for Linux (IFL) specialty pro-
cessors. This solution can dramatically reduce downtime (by
over 99 percent) and provides the highest level of support in
mission critical applications. The built-in support for Linux
opens System z to a wide variety of business analytics use
cases, supporting workload consolidation and linking to other
servers (for example, Power Systems blades).
IBM PureSystems
IBM PureSystems is a series of integrated systems that allow
choice and flexibility in server deployments, including both
x86 and non-x86 server nodes. The nodes are connected by
32. Big Data Analytics Infrastructure For Dummies26
a high-speed fabric and built-in switches. These integrated
systems have built-in security features. Flexibility in server
deployments means that you can add nodes later on as
demand for computing resources grow. A single console man-
ages shared system resources.
Three major areas of focus in the IBM PureScale design are as
follows:
✓ Scale-in systems design: The system’s overall design
automates integrated server, storage, and networking
resources for both physical and virtual environments.
✓ Patterns of expertise: The systems embed software tools
that automate configuration, upgrades, and application
requirements.
✓ Cloud-ready integration: Designed to be deployed to
support private self-service cloud infrastructure and
software. That enables flexible scale-up and provisioning
of workloads.
IBM Storage
IBM has a portfolio of storage capabilities for BD&A deploy-
ments. In this section, you take a look at those and get the
lowdown on storage basics.
Looking at storage basics
IBM storage solutions are designed to save data center real
estate through the use of dense storage and power efficiency.
Essential storage technologies are built in and tightly inte-
grated. Essentially, they have the following features:
✓ Self-optimizing
✓ Cloud-agile
✓ Reduced space requirements
33. Chapter 3: Looking at IBM Infrastructure Choices 27
✓ Intuitive management
✓ No manual intervention required
✓ Reduced administrator time
IBM storage solutions use techniques such as virtualization,
deduplication, data compression, automated tiering, and cata-
loging. Take a look at the nearby sidebar “A brief glossary of
terms” for more information.
Like servers, storage can be scaled up and scaled out. Check
out the section “IBM Servers” earlier in the chapter for more
information on scaling up and scaling out.
IBM FlashSystem
Huge amounts of data are generated by sensors and comput-
ers in addition to data created by people. It’s hard to keep up
with data growth. And then, without high-performance stor-
age, it can be impossible to extract meaningful insights from
rapidly accumulating data in areas such as customer satisfac-
tion, operational efficiency, financial processes, risk, fraud,
and compliance management.
The latest innovation from a BD&A perspective is the use of
flash technology for storage. You can boost business agility
and insight with flash storage for analytics. That’s where IBM
FlashSystem comes in.
IBM FlashSystem is fast, so it unlocks storage performance
to give analytics applications the speed they need. But speed
is just one component of effective analytics applications.
Storage must also meet three key requirements in order to
effectively support analytics workloads:
✓ Compelling data economics
✓ Enterprise resiliency and
✓ Easy infrastructure integration
34. Big Data Analytics Infrastructure For Dummies28
FlashSystem delivers advanced technologies in each of these
areas, so organizations can unleash the power of their data for
critical business insights.
IBM System Software
IBM provides two key system software offerings designed
for BD&A: IBM Elastic Storage (formerly General Parallel File
System) and Platform Computing.
A brief glossary of terms
When you talk about infrastructure,
youcanacquaintseveralusefulterms:
✓ Virtualization: Creates a virtual
(rather than actual) version of
something. This goes back to
the days of creating mainframe
partitions, so multiple jobs could
run simultaneously on the same
machine. Hardware virtualization
is creating a “virtual machine”
that acts like a real computer with
an operating system. Hardware
virtualizationisawayofimproving
efficiency.
✓ Deduplication: Eliminates dupli-
cate copies of the same data.
You get two benefits from data
deduplication: a single version of
the truth and excellent leverage
in storage.
✓ Data compression: Encodes
information using fewer bits than
theoriginalrepresentation.That’s
a space saver. Compression
helps reduce resource usage,
such as data storage space or
transmission capacity.
✓ Automated tiering: Automates
the movement of data across
different types of storage devices
and media (tiering). The most
important data goes to flash or
cacheddisks.Lessimportantdata
goes to less expensive, slower
devices,suchasopticaldisksand
tape drives.
✓ Cataloging: Comprises views
and tables that contain infor-
mation about all the objects in
the database — sometimes
referred to as metadata. As the
database changes, the catalog
changes. You can leverage this
in many ways, including report
generation, automation of
required database changes, and
checking compliance with data-
base naming conventions.
35. Chapter 3: Looking at IBM Infrastructure Choices 29
IBM Elastic Storage
IBM Elastic Storage is a high-performance, cost-effective stor-
age methodology. It started out as a clustered file system and
evolved into much more. Today it’s a full-featured set of file
management tools, including advanced storage virtualization,
integrated high availability, automated tiered storage manage-
ment, and performance to effectively manage very large quan-
tities of file-based data.
IBM Elastic Storage is designed to support a wide variety of
application workloads and has been proven extremely effec-
tive in very large, demanding environments. It is installed in
clusters supporting BD&A, gene sequencing, digital media,
and scalable file serving. All indications are that the future
of BD&A includes bringing more unstructured and file-based
data into the application.
An Elastic Storage cluster can be a single server, two servers
providing a high-availability platform supporting a database
application, or thousands of servers used for applications
such as the modeling of weather patterns, or supporting the
needs of advanced analytics leveraging file-based data. IBM
Elastic Storage was designed to support high-performance
workloads and has been proven effective for a variety of BD&A
applications.
Platform Computing
When you have big data problems to solve, you need to maxi-
mize the potential of your computing power and the support-
ing infrastructure to accelerate your applications at scale,
extract insight from your data, and make better decisions
faster. IBM Platform Computing provides that by pooling your
computing resources, managing them efficiently across mul-
tiple groups, and getting the most out of your IT investment.
For those looking to accelerate Hadoop performance, IBM
Platform Computing’s Symphony delivers peak performance
as measured by Terasort and UCB SWIM results. Symphony is
a low-latency grid management platform for SOA and distrib-
uted applications. It supports diverse compute and data inten-
sive applications.