Nimble Storage Leverages Big Data and Cloud to Produce Data Performance Optimization on the Fly
Nimble Storage Leverages Big Data and Cloud to Produce
Data Performance Optimization on the Fly
Transcript of a BrieﬁngsDirect podcast on how a hybrid storage provider can analyze
operational data to bring about increased efﬁciency.
Listen to the podcast. Find it on iTunes. Sponsor: HP
Dana Gardner: Hello, and welcome to the next edition of the HP Discover Performance
Podcast Series. I'm Dana Gardner, Principal Analyst at Interarbor Solutions, your moderator for
this ongoing discussion of IT innovation and how it’s making an impact on people’s lives.
Once again, we’re focusing on how IT leaders are improving their business performance for
better access, use and analysis of their data and information. [Disclosure: HP is a
sponsor of BrieﬁngsDirect podcasts.]
Our next innovation case study focuses on how optimized hybrid storage
provider Nimble Storage has leveraged big data and cloud to produce signiﬁcant
storage performance, and efﬁciency.
We'll learn how Nimble Storage has leveraged the HP Vertica analytics platform to analyze
operational data on mixed-storage environments and in near real time to optimize workloads.
We're going to learn more about how high-performing, cost-effective big-data processing via
cloud helps to make the best use of dynamic storage resources. It's a fascinating story.
Please join me now in welcoming our guest. We have Larry Lancaster, the Chief Data Scientist at
Nimble Storage Incorporated in San Jose, California. Welcome, Larry.
Larry Lancaster: Hi, Dana, it's great to talk to you today.
Gardner: I'm glad you could join us. As I said, it's a fascinating use case. Tell us about the
general scope of how you use data in the cloud to create this hybrid storage optimization service.
Lancaster: At a high level, Nimble Storage recognized early, near the inception of the product,
that if we were able to collect enough operational data about how our products
are performing in the ﬁeld, get it back home, and analyze it, we'd be able to
dramatically reduce support costs. Also, we can create a feedback loop that
allows engineering to improve the product very quickly, according to the
demands that are being placed on the product in the ﬁeld.
Looking at it from that perspective, to get it right, you need to do it from the
inception of the product. If you take a look at how much data we get back for every array we sell
in the ﬁeld, we could be receiving anywhere from 10,000 to 100,000 data points per minute from
each array. Then, we bring those back home, we put them into a database, and we run a lot of
intensive analytics on those data.
Once you're doing that, you realize that as soon as you do something, you have this data you're
starting to leverage. You're making support recommendations and so on, but then you realize you
could do a lot more with it. We can do dynamic cache sizing. We can ﬁgure out how much cash a
customer needs based on an analysis of their real workloads.
We found that big data is really paying off for us. We want to continue to increase how much it's
paying off for us, but to do that we need to be able to do bigger queries faster. We have a team of
data scientists and we don't want them sitting here twiddling their thumbs. That’s what brought
us to Vertica at Nimble.
Using big data
Gardner: It's an interesting juxtaposition that you're using big data in order to better manage
data and storage. What better use of it? And what sort of efﬁciencies are we talking about here,
when you are able to get that data in that massive scale and do these analytics and then go back
out into the ﬁeld and adjust? What does that get for you?
Lancaster: We have a very tight feedback loop. In one release we put out, we may make some
changes in the way certain things happen on the back end, for example, the way
NVRAM is drained. There are some very particular details around that, and we
can observe very quickly how that performs under different workloads. We can
make tweaks and do a lot of tuning.
Without the kind of data we have, we might have to have multiple cases being
opened on performance in the ﬁeld and escalations, looking at cores, and then
simulating things in the lab.
It's a very labor-intensive slow process with very little data to base the decision on. When you
bring home operational data from all your products in the ﬁeld, you're now talking about being
able to ﬁgure out in near real time the distribution of workloads in the ﬁeld and how people
access their storage. I think we have a better understanding of the way storage works in the real
world than any other storage vendor, simply because we have the data.
Gardner: So it's an interesting combination of a product lifecycle approach to getting data, but
also combining a service with a product in such a way that you're adjusting in real time.
Lancaster: That’s right. We do a lot of neat things. We do capacity forecasting. We do a lot of
predictive analytics to try to ﬁgure out when the storage administrator is going to need to
purchase something, rather than having them just stumble into the fact that they need to
provision for equipment because they've run out of space.
A lot of things that should have been done in storage from the very beginning that sound
straightforward were simply never done. We're the ﬁrst company to take a comprehensive
approach to it. We open and close 80 percent of our cases automatically, 90 percent of them are
We have a suite of tools that run on this operational data, so we don't have to call people up and
say, "Please gather this data for us. Please send us these log posts. Please send us these statistics."
Now, we take a case that could have taken two or three days and we turn it into something that
can be done in an hour.
That’s the kind of efﬁciency we gain that you can see, and the InfoSight service delivers that to
Gardner: Larry, just to be clear, you're supporting both ﬂash and traditional disk storage, but
you're able to exploit the hybrid relationship between them because of this data and analysis. Tell
us a little bit about how the hybrid storage works.
Challenge for hard drives
Lancaster: At a high level, you have hard drives, which are inexpensive, but they're slow for
random I/O. For sequential I/O, they are all right, but for random I/O performance, they're slow.
It takes time to move the platter and the head. You're looking at 5 to 10 milliseconds seek time
for random read.
That's been the challenge for hard drives. Flash drives have come out and they can dramatically
improve on that. Now, you're talking about microsecond-order latencies, rather than
But the challenge there is that they're expensive. You could go buy all ﬂash or you could go buy
all hard drives and you can live with those downsides of each. Or, you can take the best of both
Then, there's a challenge. How do I keep the data that I need to access randomly in ﬂash, but
keep the rest of the data that I don't care so much about in a frequent random-read performance,
keep that on the hard drives only, and in that way, optimize my use of ﬂash. That's the way you
can save money, but it's difﬁcult to do that.
It comes down to having some understanding of the workloads that the customer is running and
being able to anticipate the best algorithms and parameters for those algorithms to make sure that
the right data is in ﬂash.
We've built up an enormous dataset covering thousands of system-years of real-world usage to
tell us exactly which approaches to caching are going to deliver the most beneﬁt. It would be
hard to be the best hybrid storage solution without the kind of analytics that we're doing.
Gardner: Then, to extrapolate a little bit higher, or maybe wider, for how this beneﬁts an
organization, the analysis that you're gathering also pertains to the data lifecycle, things like
disaster recovery, business continuity, backups, scheduling, and so forth. Tell us how the data
gathering analytics has been applied to that larger data lifecycle equation.
Lancaster: You're absolutely right. One of the things that we do is make sure that we audit all of
the storage that our customers have deployed to understand how much of it is protected with
local snapshots, how much of it is replicated for disaster recovery, and how much incremental
space is required to increase retention time and so on.
We have very efﬁcient snapshots, but at the end of the day, if you're making changes, snapshots
still do take some amount of space. So, exactly what is that overhead, and how can we help you
achieve your disaster recovery goals?
We have a good understanding of that in the ﬁeld. We go to customers with proactive service
recommendations about what they could and should do. But we also take into account the fact
that they may be doing disaster recovery (DR) when we forecast how much capacity they are
going to need.
You're right. It is part of a larger lifecycle that we address, but at the end of the day, for my
team it's still all about analytics. It's about looking to the data as the source of truth and as the
source of recommendation.
We can tell you roughly how much space you're going to need to do disaster recovery on a given
type of application, because we can look in our ﬁeld and see the distribution of the extra space
that would take and what kind of bandwidth you're going to need. We have all that information at
When you start to work this way, you realize that you can do things you couldn't do before. And
the things you could do before, you can do orders of magnitude better. So we're a great case of
actually applying data science to the product lifecycle, but also to front-line revenue and cost
Gardner: I think this is a great example and I think you're a harbinger of what we're going to see
more and more, which is bringing this high level of intelligence to bear on many other different
services, for many different types of products. IT and storage is great and makes a lot of sense as
an early adopter. But I can see this is pertaining to many other vertical industries. It illustrates
where a lot of big-data value is going to go.
Now, let's dig into how you actually can get that analysis in the speed, at the scale, and at the cost
that you require. Tell us about your journey in terms of different analytics platforms and data
architectures that you've been using and where you're headed.
Lancaster: To give you a brief history of my awareness of Vertica and my involvement around
the product, I don’t remember the exact year, but it may have been eight years ago roughly. At
some point, there was an announcement that Mike Stonebraker was involved in a group that was
going to productize the C-Store Database, which was sort of an academic experiment at UC
Berkeley, to understand the beneﬁts and capabilities of real column store.
I was immediately interested and contacted them. I was working at another storage company at
the time. I had a 20 terabyte (TB) data warehouse, which at the time was one of the largest
Oracle on Linux data warehouses in the world.
They didn't want to touch that opportunity just yet, because they were just starting out in alpha
mode. I hooked up with them again a few years later, when I was CTO at a company called
Glassbeam, where we developed what's substantially an extract, transform, and load (ETL)
By then, they were well along the road. They had a great product and it was solid. So we tried it
out, and I have to tell you, I fell in love with Vertica, because of the performance beneﬁts that it
When you start thinking about collecting as many different data points as we like to collect, you
have to recognize that you’re going to end up with a couple choices on a row store. Either you're
going to have very narrow tables and a lot of them or else you're going to be wasting a lot of I/O
overhead, retrieving entire rows where you just need a couple ﬁelds.
That was what piqued my interest at ﬁrst. But as I began to use it more and more at Glassbeam,
I realized that the performance beneﬁts you could gain by using Vertica properly were another
order of magnitude beyond what you would expect just with the column-store efﬁciency.
That's because of certain features that Vertica allows, such as something called pre-join
projections. We can drill into that sort of stuff more if you like, but, at a high-level, it lets you
maintain the normalized logical integrity of your schema, while having under the hood, an
optimized denormalized query performance physically on disk.
Now you might ask you can be efﬁcient if you have a denormalized structure on disk. It's
because Vertica allows you to do some very efﬁcient types of encoding on your data. So all of the
low cardinality columns that would have been wasting space in a row store end up taking almost
no space at all.
What you ﬁnd, at least it's been my impression, is that Vertica is the data warehouse that you
would have wanted to have built 10 or 20 years ago, but nobody had done it yet.
Nowadays, when I'm evaluating other big data platforms, I always have to look at it from the
perspective of it's great, we can get some parallelism here, and there are certain operations that
we can do that might be difﬁcult on other platforms, but I always have to compare it to Vertica.
Frankly, I always ﬁnd that Vertica comes out on top in terms of features, performance, and
Gardner: When you arrived there at Nimble Storage, what were they using, and where are you
now on your journey into a transition to Vertica?
Lancaster: I built the environment here from the ground up. When I got here, there were
roughly 30 people. It's a very small company. We started with Postgres. We started with
something free. We didn’t want to have a large budget dedicated to the backing infrastructure just
yet. We weren’t ready to monetize it yet.
So, we started on Postgres and we've scaled up now to the point where we have about 100 TBs
on Postgres. We get decent performance out of the database for the things that we absolutely
need to do, which are micro-batch updates and transactional activity. We get that performance
because the database lives on Nimble Storage.
I don't know what the largest unsharded Postgres instance is in the world, but I feel like I have
one of them. It's a challenge to manage and leverage. Now, we've gotten to the point where we're
really enjoying doing larger queries. We really want to understand the entire installed base of
how we want to do analyses that extend across the entire base.
We want to understand the lifecycle of a volume. We want to understand how it grows, how it
lives, what its performance characteristics are, and then how gradually it falls into senescence
when people stop using it. It turns out there is a lot of really rich information that we now have
access to to understand storage lifecycles in a way I don't think was possible before.
But to do that, we need to take our infrastructure to the next level. So we've been doing that and
we've loaded a large number of our sensor data that’s the numerical data I have talked about into
Vertica, started to compare the queries, and then started to use Vertica more and more for all the
analysis we're doing.
Internally, we're using Vertica, just because of the performance beneﬁts. I can give you an
example. We had a particular query, a particularly large query. It was to look at certain aspects of
latency over a month across the entire installed base to understand a little bit about the
distribution, depending on different factors, and so on.
We ran that query in Postgres, and depending on how busy the server was, it took anywhere
from 12 to 24 hours to run. On Vertica, to run the same query on the same data takes anywhere
from three to seven seconds.
I anticipated that, because we were aware upfront of the beneﬁts we'd be getting. I've seen it
before. We knew how to structure our projections to get that kind of performance. We knew what
kind of infrastructure we'd need under it. I'm really excited. We're getting exactly what we
wanted and better.
This is only a three node cluster. Look at the performance we're getting. On the smaller queries,
we're getting sub-second latencies. On the big ones, we're getting sub-10 second latencies. It's
absolutely amazing. It's game changing.
People can sit at their desktops now, manipulate data, come up with new idea,s and iterate
without having to run a batch and go home. It's a dramatic productivity increase. Data scientists
tend to be fairly impatient. They're highly paid people, and you don’t want them sitting at their
desk waiting to get an answer out of the database. It's not the best use of their time.
Gardner: Larry, is there another aspect to the Vertica value when it comes to the cloud model
for deployment? It seems to me that if Nimble Storage continues to grow rapidly and scales that,
bringing all that data back to a central single point might be problematic. Having it distributed or
in different cloud deployment models might make sense. Is there something about the way
Vertica works within a cloud services deployment that is of interest to you as well?
Lancaster: There's the ease of adding nodes without downtime, the fact that you can create a
K-safe cluster. If my cluster is 16 nodes wide now, and I want two nodes redundancy, it's very
similar to RAID. You can specify that, and the database will take care of that for you. You don’t
have to worry about the database going down and losing data as a result of the node failure every
time or two.
I love the fact that you don’t have to pay extra for that. If I want to put more cores or nodes on it
or I want to put more redundancy into my design, I can do that without paying more for it. Wow!
That’s kind of revolutionary in itself.
It's great to see a database company incented to give you great performance. They're incented to
help you work better with more nodes and more cores. They don't have to worry about people
not being able to pay the additional license fees to deploy more resources. In that sense, it's great.
We have our own private cloud -- that’s how I like to think of it -- at an offsite colo. We do DR
through Nimble Storage. At the same time, we have a K-safe cluster. We had a hardware glitch
on one of the nodes last week, and the other two nodes stayed up, served data, and everything
Those kinds of features are critical, and that ability to be ﬂexible and expand is critical for
someone who is trying to build a large cloud infrastructure, because you're never going to know
in advance exactly how much you're going to need.
If you do your job right as a cloud provider, people just want more and more and more. You want
to get them hooked and you want to get them enjoying the experience. Vertica lets you do that.
Gardner: Well very good. I'm afraid we'll have to leave it there. We've been learning about how
optimized hybrid storage provider Nimble Storage has leveraged big data in cloud to produce
unique storage performance analytics and efﬁciencies and we've seen how the HP Vertica
Analytics platform has been used to analyze their operational data across mixed storage
environments in near real-time, so that they can optimize their workloads and also extend the
beneﬁts to a data lifecycle.
So, a big thank you to our guest. We've been joined by Larry Lancaster, the Chief Data Scientist
at Nimble Storage. Thank you, Larry.
Lancaster: Thanks, Dana.
Gardner: Also, thank you to our audience for joining us for this special HP Discover
I'm Dana Gardner; Principal Analyst at Interarbor Solutions, your host for this ongoing series of
HP sponsored discussions. Thanks again for joining, and come back next time.
Listen to the podcast. Find it on iTunes. Sponsor: HP
Transcript of a BrieﬁngsDirect podcast on how a hybrid storage provider can analyze
operational data to bring about increased efﬁciency. Copyright Interarbor Solutions, LLC,
2005-2013. All rights reserved.
You may also be interested in:
• MZI Healthcare Identiﬁes Big Data Patient Productivity Gems Using HP Vertica
• Thought Leader Interview: HP's Global CISO Brett Wahlin on the future of Security and
• Panel explains how CSC creates a tough cybersecurity posture against global threats
• Risk and complexity: Businesses need to get a grip
• Advanced IT monitoring Delivers Predictive Diagnostics Focus to United Airlines
• CSC and HP team up to deﬁne the new state needed for comprehensive enterprise
• BYOD brings new security challenges for IT: Allowing greater access while protecting
• HP Vertica Architecture Gives Massive Performance Boost to Toughest BI Queries for