1. | |july 2014
1CIOReview| |October 2015
1CIOReview
T h e N a v i g a t o r f o r E n t e r p r i s e S o l u t i o n s
OCTOBER - 30 - 2015 CIOREVIEW.COM
BIGDATA SPECIAL
AnDrE FuETSCh,
SVP,
AT&T
In My Opinion:
rAnDy SLOAn,
SVP & CIO,
SOuThwEST AIrLInES
CIO Insights:
Smarter
Decisions through
Big Data
TransUnion:
Jim Peck,
CEO & President
Company of the Month:
Ty Moser, Founder,
President & CEO,
Moser Consulting
2. | |july 2014
28CIOReview
| |October 2015
44CIOReview
CXO
INSIGHTS
T
he preservation of human
knowledge is of paramount
importance to progress,
now and in the future. And
because the vast majority of new data
is stored digitally, the need for reliable
digital storage is greater than ever. The
challenge today is ensuring that the
drives mass-produced by the storage
industry in order to keep up with the
ever-growing need for data storage are
manufactured according to the highest
standards of quality. The solution to that
challenge may lie in a relatively new but
fast-growing field known as Big Data
Analytics.
The need for reliable data storage
is particularly urgent in light of the fact
that the amount of data stored every year
is increasing rapidly. Indeed, much more
data is generated than is actually stored.
For example, CERN generates close to
a petabyte of data every second while
particles fired around the Large Hadron
Collider at velocities approaching the
speed of light are smashed together. But
CERN can only store approximately 25
PB of this data every year—equivalent
to about 8,333 full 3 TB hard disk drives.
When a disk drive is manufactured it
acts as an intelligent sensor that is aware
of its own health and quality, and it
stores its own sensor logs. These drives
are tested for many days, and during that
time, they might generate megabytes of
test, diagnostic, and configuration data
— as many as a 1,000 variables logged
for each drive. In addition, information
is collected about every important
component going into each drive, how
these components are combined, where
and when each component and each
drive was built, which firmware is used,
which customer it goes to, and many
other pieces of information.
The resulting combination
of parameters, attributes and
measurements can result in hundreds
of thousands of combinations and
resulting interdepencies. Analyzing
these combinations alone and together
requires new ways, new tools and new
ideas in order to separate key signals
or information from noise. There are
so many variables and parameters that
affect drive quality, reliability, and
performance that no traditional data
analysis approach can easily work on the
data generated and collected during the
manufacturing process.
Using Big Data Analytics
to produce high-quality
Big Data Storage
By Andrei Khurshudov, Chief Technologist, Seagate, Mark Brewer, SVP and CIO, Seagate Technology, Michael Crump,
VP of Quality, Seagate Technology
Mark Brewer
Andrei Khurshudov
3. | |july 2014
29CIOReview
| |October 2015
45CIOReview
How do we address this drive quality and reliability
challenge? Through Big Data Analytics, which combine such
techniques as advanced statistics and machine learning with
large amounts of data to extract those answers that are not
visible to more traditional analytics, operating with smaller
data set. With so much data available, using Big Data Analytics
can help control product quality and troubleshoot issues as
quickly as possible.
The first thing we need in order to implement Big Data
Analytics that ensures magnetic hard drive reliability is a
robust, coherent, end-to-end data collection process which
captures everything that could be important, and offers it for
further analysis. This data will be available when it’s needed,
and found where it’s expected. And it’s coherent in the sense
that all those pieces of data can be matched together as needed.
Ahard drive will be subject to this process starting from the time
and place where each main complement is “born” to the drive
factory, through the assembly lines, days of configuration and
testing, to the customer who is using them to build computers
or storage systems, to the end user, all the way to the end of
the drive’s life.
Second, we need storage infrastructure and an ecosystem
that lends itself to Big Data Analytics and complex data
mining. That means that a more traditional Enterprise Data
Warehouse architecture running relational databases should
be complemented by (and linked to) solutions designed for
distributed analytics and parallel computing, providing a
modern ecosystem with Hadoop / Spark capabilities, no-SQL
databases (such as MongoDB and Cassandra), and the ability
to store all data possible (both structured and unstructured),
and access it in parallel for better performance.
Third, we need trained personnel using Big Data Analytics
algorithms and solutions: true Data Scientists capable of
working with extremely large data sets using the most advanced
machine-learning techniques, and seamlessly linking all the
best programming environments and languages, machine-
learning libraries, and elements of a highly-distributed storage
and analytics ecosystem together. Together they can understand
the complex data generated through testing, and guarantee the
best product quality, reliability, and performance possible.
This is the approach that Seagate has implemented, and it
has already resulted in a dramatic improvement in the quality
of our products—which means more data can be preserved to
retrieve and use in the future.
Modern challenges require modern approaches. Making
highly reliable devices to store all of the data generated
in today’s world, mass-producing these devices in tens of
millions per quarter, becomes impossible without Big Data
Analytics and Machine Learning technology. These are now a
requirement for any leading high-volume technology company
in the 21th century.
Seagate's reputation for
quality and reliability
in its products is driven
by our manufacturing
excellence and supply
chain efficiency
Michael Crump