| |july 2014
1CIOReview| |October 2015
1CIOReview
T h e N a v i g a t o r f o r E n t e r p r i s e S o l u t i o n s
OCTOBER - 30 - 2015 CIOREVIEW.COM
BIGDATA SPECIAL
AnDrE FuETSCh,
SVP,
AT&T
In My Opinion:
rAnDy SLOAn,
SVP & CIO,
SOuThwEST AIrLInES
CIO Insights:
Smarter
Decisions through
Big Data
TransUnion:
Jim Peck,
CEO & President
Company of the Month:
Ty Moser, Founder,
President & CEO,
Moser Consulting
| |july 2014
28CIOReview
| |October 2015
44CIOReview
CXO
INSIGHTS
T
he preservation of human
knowledge is of paramount
importance to progress,
now and in the future. And
because the vast majority of new data
is stored digitally, the need for reliable
digital storage is greater than ever. The
challenge today is ensuring that the
drives mass-produced by the storage
industry in order to keep up with the
ever-growing need for data storage are
manufactured according to the highest
standards of quality. The solution to that
challenge may lie in a relatively new but
fast-growing field known as Big Data
Analytics.
The need for reliable data storage
is particularly urgent in light of the fact
that the amount of data stored every year
is increasing rapidly. Indeed, much more
data is generated than is actually stored.
For example, CERN generates close to
a petabyte of data every second while
particles fired around the Large Hadron
Collider at velocities approaching the
speed of light are smashed together. But
CERN can only store approximately 25
PB of this data every year—equivalent
to about 8,333 full 3 TB hard disk drives.
When a disk drive is manufactured it
acts as an intelligent sensor that is aware
of its own health and quality, and it
stores its own sensor logs. These drives
are tested for many days, and during that
time, they might generate megabytes of
test, diagnostic, and configuration data
— as many as a 1,000 variables logged
for each drive. In addition, information
is collected about every important
component going into each drive, how
these components are combined, where
and when each component and each
drive was built, which firmware is used,
which customer it goes to, and many
other pieces of information.
The resulting combination
of parameters, attributes and
measurements can result in hundreds
of thousands of combinations and
resulting interdepencies. Analyzing
these combinations alone and together
requires new ways, new tools and new
ideas in order to separate key signals
or information from noise. There are
so many variables and parameters that
affect drive quality, reliability, and
performance that no traditional data
analysis approach can easily work on the
data generated and collected during the
manufacturing process.
Using Big Data Analytics
to produce high-quality
Big Data Storage
By Andrei Khurshudov, Chief Technologist, Seagate, Mark Brewer, SVP and CIO, Seagate Technology, Michael Crump,
VP of Quality, Seagate Technology
Mark Brewer
Andrei Khurshudov
| |july 2014
29CIOReview
| |October 2015
45CIOReview
How do we address this drive quality and reliability
challenge? Through Big Data Analytics, which combine such
techniques as advanced statistics and machine learning with
large amounts of data to extract those answers that are not
visible to more traditional analytics, operating with smaller
data set. With so much data available, using Big Data Analytics
can help control product quality and troubleshoot issues as
quickly as possible.
The first thing we need in order to implement Big Data
Analytics that ensures magnetic hard drive reliability is a
robust, coherent, end-to-end data collection process which
captures everything that could be important, and offers it for
further analysis. This data will be available when it’s needed,
and found where it’s expected. And it’s coherent in the sense
that all those pieces of data can be matched together as needed.
Ahard drive will be subject to this process starting from the time
and place where each main complement is “born” to the drive
factory, through the assembly lines, days of configuration and
testing, to the customer who is using them to build computers
or storage systems, to the end user, all the way to the end of
the drive’s life.
Second, we need storage infrastructure and an ecosystem
that lends itself to Big Data Analytics and complex data
mining. That means that a more traditional Enterprise Data
Warehouse architecture running relational databases should
be complemented by (and linked to) solutions designed for
distributed analytics and parallel computing, providing a
modern ecosystem with Hadoop / Spark capabilities, no-SQL
databases (such as MongoDB and Cassandra), and the ability
to store all data possible (both structured and unstructured),
and access it in parallel for better performance.
Third, we need trained personnel using Big Data Analytics
algorithms and solutions: true Data Scientists capable of
working with extremely large data sets using the most advanced
machine-learning techniques, and seamlessly linking all the
best programming environments and languages, machine-
learning libraries, and elements of a highly-distributed storage
and analytics ecosystem together. Together they can understand
the complex data generated through testing, and guarantee the
best product quality, reliability, and performance possible.
This is the approach that Seagate has implemented, and it
has already resulted in a dramatic improvement in the quality
of our products—which means more data can be preserved to
retrieve and use in the future.
Modern challenges require modern approaches. Making
highly reliable devices to store all of the data generated
in today’s world, mass-producing these devices in tens of
millions per quarter, becomes impossible without Big Data
Analytics and Machine Learning technology. These are now a
requirement for any leading high-volume technology company
in the 21th century.
Seagate's reputation for
quality and reliability
in its products is driven
by our manufacturing
excellence and supply
chain efficiency
Michael Crump

Using Big Data Analytics

  • 1.
    | |july 2014 1CIOReview||October 2015 1CIOReview T h e N a v i g a t o r f o r E n t e r p r i s e S o l u t i o n s OCTOBER - 30 - 2015 CIOREVIEW.COM BIGDATA SPECIAL AnDrE FuETSCh, SVP, AT&T In My Opinion: rAnDy SLOAn, SVP & CIO, SOuThwEST AIrLInES CIO Insights: Smarter Decisions through Big Data TransUnion: Jim Peck, CEO & President Company of the Month: Ty Moser, Founder, President & CEO, Moser Consulting
  • 2.
    | |july 2014 28CIOReview ||October 2015 44CIOReview CXO INSIGHTS T he preservation of human knowledge is of paramount importance to progress, now and in the future. And because the vast majority of new data is stored digitally, the need for reliable digital storage is greater than ever. The challenge today is ensuring that the drives mass-produced by the storage industry in order to keep up with the ever-growing need for data storage are manufactured according to the highest standards of quality. The solution to that challenge may lie in a relatively new but fast-growing field known as Big Data Analytics. The need for reliable data storage is particularly urgent in light of the fact that the amount of data stored every year is increasing rapidly. Indeed, much more data is generated than is actually stored. For example, CERN generates close to a petabyte of data every second while particles fired around the Large Hadron Collider at velocities approaching the speed of light are smashed together. But CERN can only store approximately 25 PB of this data every year—equivalent to about 8,333 full 3 TB hard disk drives. When a disk drive is manufactured it acts as an intelligent sensor that is aware of its own health and quality, and it stores its own sensor logs. These drives are tested for many days, and during that time, they might generate megabytes of test, diagnostic, and configuration data — as many as a 1,000 variables logged for each drive. In addition, information is collected about every important component going into each drive, how these components are combined, where and when each component and each drive was built, which firmware is used, which customer it goes to, and many other pieces of information. The resulting combination of parameters, attributes and measurements can result in hundreds of thousands of combinations and resulting interdepencies. Analyzing these combinations alone and together requires new ways, new tools and new ideas in order to separate key signals or information from noise. There are so many variables and parameters that affect drive quality, reliability, and performance that no traditional data analysis approach can easily work on the data generated and collected during the manufacturing process. Using Big Data Analytics to produce high-quality Big Data Storage By Andrei Khurshudov, Chief Technologist, Seagate, Mark Brewer, SVP and CIO, Seagate Technology, Michael Crump, VP of Quality, Seagate Technology Mark Brewer Andrei Khurshudov
  • 3.
    | |july 2014 29CIOReview ||October 2015 45CIOReview How do we address this drive quality and reliability challenge? Through Big Data Analytics, which combine such techniques as advanced statistics and machine learning with large amounts of data to extract those answers that are not visible to more traditional analytics, operating with smaller data set. With so much data available, using Big Data Analytics can help control product quality and troubleshoot issues as quickly as possible. The first thing we need in order to implement Big Data Analytics that ensures magnetic hard drive reliability is a robust, coherent, end-to-end data collection process which captures everything that could be important, and offers it for further analysis. This data will be available when it’s needed, and found where it’s expected. And it’s coherent in the sense that all those pieces of data can be matched together as needed. Ahard drive will be subject to this process starting from the time and place where each main complement is “born” to the drive factory, through the assembly lines, days of configuration and testing, to the customer who is using them to build computers or storage systems, to the end user, all the way to the end of the drive’s life. Second, we need storage infrastructure and an ecosystem that lends itself to Big Data Analytics and complex data mining. That means that a more traditional Enterprise Data Warehouse architecture running relational databases should be complemented by (and linked to) solutions designed for distributed analytics and parallel computing, providing a modern ecosystem with Hadoop / Spark capabilities, no-SQL databases (such as MongoDB and Cassandra), and the ability to store all data possible (both structured and unstructured), and access it in parallel for better performance. Third, we need trained personnel using Big Data Analytics algorithms and solutions: true Data Scientists capable of working with extremely large data sets using the most advanced machine-learning techniques, and seamlessly linking all the best programming environments and languages, machine- learning libraries, and elements of a highly-distributed storage and analytics ecosystem together. Together they can understand the complex data generated through testing, and guarantee the best product quality, reliability, and performance possible. This is the approach that Seagate has implemented, and it has already resulted in a dramatic improvement in the quality of our products—which means more data can be preserved to retrieve and use in the future. Modern challenges require modern approaches. Making highly reliable devices to store all of the data generated in today’s world, mass-producing these devices in tens of millions per quarter, becomes impossible without Big Data Analytics and Machine Learning technology. These are now a requirement for any leading high-volume technology company in the 21th century. Seagate's reputation for quality and reliability in its products is driven by our manufacturing excellence and supply chain efficiency Michael Crump