Unblocking The Main Thread Solving ANRs and Frozen Frames
Hadoop and Vertica at Snagajob: How Big Data Technologies Drive Business Results
1. Hadoop and Vertica at Snagajob: How Big Data
Technologies Drive Business Results
Transcript of a BriefingsDirect podcast on how an employment search company is using data
analysis to bring better matching for job seekers and employers.
Listen to the podcast. Find it on iTunes. Sponsor: HP
Dana Gardner: Hello, and welcome to the next edition of the HP Discover Podcast Series. I'm
Dana Gardner, Principal Analyst at Interarbor Solutions, your host and moderator for this
ongoing sponsored discussion on IT innovation and how it’s making an impact
on people’s lives.
Once again, we're focusing on how companies are adapting to the new style of
IT to improve IT performance and deliver better user experiences, as well as
better business results.
Gardner
This time, we're coming to you directly from the HP Big Data 2014 Conference in Boston. We're
here the week of August 11 to learn directly from IT and business leaders alike how big data,
cloud, and converged infrastructure implementations are supporting their goals.
Our next innovation case study interview highlights how Snagajob in Richmond, Virginia, one of
the largest hourly employment networks for job seekers and employers is using big data to
improve their performance, as well as to better understand how their systems
provide services to their end users in a very rapid environment.
Snagajob has already recently delivered almost half a million new jobs in a
single month to their systems. So the scale here is very impressive. To learn how
they're managing that we are here with Robert Fehrmann, the Data Architect at
Snagajob in Richmond, Virginia. Welcome to the show.
Robert Fehrmann: Dana, thank you for the introduction.
Gardner: First, tell us about your organization. How are hourly workers different from regular
employment? What type of employment are we talking about? You’ve been around since 2000
and you’ve been doing this successfully. Let's understand the role you play in the employment
market.
Fehrmann: Snagajob, as you mentioned, is America's largest hourly network for employees and
employers. The hourly market means we have, relatively speaking, high turnover.
Another aspect, in comparison to some of our competitors, is that we provide an inexpensive
service. So our subscriptions are on the low end, compared to our competitors.
2. Gardner: Tell us how you've used big data to improve your operations. I believe that among the
first ways that you’ve done that is to try to better analyze your performance metrics. What were
you facing as a problem when it came to performance metrics?
Signs of stress
Fehrmann: A couple of years ago, we started looking at our environment, and it became
obvious that our traditional technology was showing some signs of stress. As you mentioned, we
really have data at scale here. We have 20,000 to 25,000 postings per day, and
we have about 700,000 unique visitors on a daily basis. So data is coming in
very, very quickly.
We also realized that we're sitting on a gold mine and we were able to ingest
data pretty well. But we had problem getting information and innovation out of
our big data lake.
Gardner: And of course, real time is important. You want to catch degradation
in any fashion from your systems right away. How do you then go about
Fehrmann
getting this in real time? How do you do the analysis?
Fehrmann: We started using Hadoop. I'll use a lot of technical terms here. From our website,
we're getting events. Events are routed via Flume directly into Hadoop. We're collecting about
600 million key-value pairs on a daily basis. It's a massive amount of data, 25 gigabytes on a
daily basis.
The second piece in this journey to big data was analyzing these events, and that’s where we're
using Vertica. Second, our original use case was to analyze a funnel. A funnel is where people
come to our site. They're searching for jobs, maybe by keyword, maybe by zip code. A subset of
that is an interest in a job, and they click on a posting. A subset of that is applying for the job via
an application. A subset is interest in an employer, and so on. We had never been able to analyze
this funnel.
The dataset is about 300 to 400 million rows and 30 to 40 gigabytes. We wanted to make this
data available, not just to our internal users, but all external users. Therefore, we set ourselves a
goal of a five-second response time. No query on this dataset should run for more than five
seconds, and Vertica and Hadoop gave us a solution for this.
Gardner: Any metrics of success? How have you been able to increase your performance reach
your key performance indicators (KPIs) and service-level agreements (SLAs)? How has this
benefited you?
3. Fehrmann: Another application that we were able to implement is a recommendation engine. A
recommendation engine is that use where our jobseekers who apply for a specific job may not
know about all the other jobs that are very similar to this job or that other people have applied to.
We started analyzing the search results that we were getting and implemented a recommendation
engine. Sometimes it’s very difficult to have real comparison between before and after. Here, we
were able to see that we got an 11 percent increase in application flow. Application flow is how
many applications a customer is getting from us. By implementing this recommendation engine,
we saw an immediate 11 percent increase in application flow, one of our key metrics.
Gardner: So you took the success from your big-data implementation and analysis capabilities
from this performance task to some other areas. Are there other business areas, search yield, for
example, where you can apply this to get other benefits?
Brand-new applications
Fehrmann: When we started, we had the idea that we were looking for a solution for migrating
our existing environment, to a better-performing new environment. But what we've seen is that
most of the applications we've developed so far are brand-new applications that we hadn't been
able to do before.
You mentioned search yield. Search yield is a very interesting aspect. It’s a massive dataset. It's
about 2.5 billion rows and about 100 gigabytes of data as of right now and it's continuously
increasing. So for all of the applications, as well as all of the search requests that we have
collected since we have started this environment, we're able to analyze the search yield.
For example, that's how many applications we get for a specific search keyword in real time. By
real time, I mean that somebody can run a query against this massive dataset and gets result in a
couple of seconds. We can analyze specific jobs in specific areas, specific keywords that are
searched in a specific time period or in a specific location of the country.
Gardner: And once again, now that you've been able to do something you couldn't do before,
what have been the results? How has that impacted change your business?
Fehrmann: It really allows our salespeople to provide great information during the prospecting
phase. If we're prospecting with a new client, we can tell him very specifically that if they're in
this industry, in this area, they can expect an application flow, depending on how big the
company is, of let’s say in a hundred applications per day.
Gardner: How has this been a benefit to your end users, those people seeking jobs and those
people seeking to fill jobs?
Fehrmann: There are certainly some jobs that people are more interested in than others. On the
flip side, if a particular job gets a 100 or 500 applications, it's just a fact that only a small number
4. going to get that particular job. Now if you apply for a job that isn't as interesting, you have
much, much higher probability of getting the job.
Gardner: Now that you’ve been here at the Big Data Conference for a day or two, what’s
jumping out of you? What would you like to see from HP going forward, maybe across the
HAVEn Portfolio or tighter integration between Hadoop and Vertica? What's one, of interest, and
two, what would you like to see next year?
Fehrmann: I attended one of the technical tracks on Maverick. It's fantastic what's coming up in
Vertica. Second, what I'd like to see from HP is a tighter integration or to continue to integrate
Hadoop and Vertica. I think it would be great to see Vertica as sort of front end into Hadoop.
Vertica has a great analytical engine and has all the SQL-92 compliance. There's not a whole lot
of competition right now. Most other SQL distributions sitting on top of Hadoop either aren’t as
compliant in terms of standards or they don't provide all the analytical capabilities. So for HP to
reach down into the HDFS storage would be a great benefit for us.
Gardner: Very good. I'm afraid we will have to leave it there. We've been talking with Snagajob,
based in Richmond, Virginia about how they are using big data on multiple levels to improve
their business performance, their system’s performance, and ultimately how they go about
understanding their new challenges and opportunities.
With that, I'd like to thank our guest. We’ve been joined by Robert Fehrmann, the Data Architect
at Snagajob in Richmond, Virginia. Thank you.
Fehrmann: Thank you, Dana.
Gardner: And I’d like to thank our audience as well for joining us for this special new style of
IT discussion coming to you directly from the HP Big Data 2014 Conference in Boston.
I'm Dana Gardner, Principal Analyst at Interarbor Solutions, your host for this ongoing series of
HP sponsored discussions. Thanks again for listening, and do come back next time.
Listen to the podcast. Find it on iTunes. Sponsor: HP
Transcript of a BriefingsDirect podcast on how an employment search company is using data
analysis to bring better matching for job seekers and employers. Copyright Interarbor Solutions,
LLC, 2005-2014. All rights reserved.
You may also be interested in:
•
How Waste Management Builds a Powerful Services Continuum Across Operations,
Infrastructure, Development and IT Practices
•
GSN Games hits top prize using big data to uncover deep insights into gamer preferences
5. •
Hybrid cloud models demand more infrastructure standardization, says global service
provider Steria
•
Service providers gain new levels of actionable customer intelligence from big data
analytics
•
How UK data solutions developer Systems Mechanics uses HP Vertica for BI, streaming
and data analysis
•
Advanced cloud service automation eases application delivery for global service provider
NNIT
•
HP network management heightens performance while reducing total costs for Nordic
telco TDC
•
How Capgemini's UK financial services unit helps clients manage risk using big data
analysis
•
Perfecto Mobile goes to cloud-based testing so developers can build the best apps faster
•
Software security pays off: How Heartland Payment Systems gains steep ROI via
software assurance tools and methods
•
HP ART documentation and readiness tools bring better user experiences to Nordic IT
solutions provider EVRY