2016-06-29
Beyond TCO
Architecting Hadoop for adoption and data applications
Reid Levesque – Head, Solution Engineering
Introduction
Topics
Technology Use cases Deployment Impact Next steps
Technology – Let’s talk Hadoop
Every company is a
technology company…
some just don’t know it
yet.
Traditional systems under pressure
Conventional wisdom
• Put the code on an Application Server
• Move the data to/from database
• Move the data to/from NAS
Reality check
• This works well for small amounts of data
• As data volumes increase this design falls apart
Hadoop to the rescue
How do we get Hadoop into the organization?
How about these use cases?
 File archive +Hadoop
 Data-intensive
grid compute
analytics
 Database
replacement
 ETL off-load +Hadoop
+Hadoop
+Hadoop
•Data is online; no need for tape
backup
•Cheaper than NAS / SAN
•Increased performance /
scalability
•Metadata is easier to get; all the
data is in one spot
•Improved performance
•Lower TCO
•Reduced dependence on
proprietary software
•Reduce RDBMS licensing
•Reduced operational cost for
analysis
•Improved functionality with
stored XML
•Lower TCO
•Additional analytic capability
•Better hardware utilization
•Lower platform management
Not so much
 File archive +Hadoop
 Data-intensive
grid compute
analytics
 Database
replacement
 ETL off-load +Hadoop
+Hadoop
+Hadoop
TCO
Which use case did work?
 Current batch was taking 4
hours; which limited the way
they did their job
 Users wanted interactive
response times to design and
test their financial models
 This was net new functionality
that could only be achieved in
Hadoop
Now TCO makes more sense
 File archive +Hadoop
 Data-intensive
grid compute
analytics
 Database
replacement
 ETL off-load +Hadoop
+Hadoop
+Hadoop
With Hadoop TCO covered,
previous use cases are
now more compelling.
How do we deploy this?
Which distribution?
Pick one:
Time to pick the hardware
Is this true?
Commodity hardware + commodity networking = bad architecture
Before there was Hadoop, there were enterprise IT standards
To name a few conflicts during the rollout…
• Local account UID / names
• OS settings
• Root access
• File locations
• Standard mount sizes
• Enterprise Active Directory
• Monitoring systems
Hadoop is NOT flexible on deployment requirements
Who does the work?
Single team including:
• Dedicated infrastructure team (Compute, Network, Data Center, Operations)
• Dedicated Hadoop team (sysadmin/operations, engineering)
• Hardware vendor engineers
• Hadoop distribution engineers
Into production we go!
What was the impact?
Changing perceptions
Impact across the organization
Infrastructure
• Networking / Data Center designs
• Relationship with storage, cloud,
virtualization capabilities
• Generating analytic use cases
Development
• Mega-attractor for talent
• Application consolidation
• Shifting from IT to business focus
Management
• Understanding (or accepting) new
paradigm
• Cross-department architecture alignment
• Data-focus rather than application-focus
Business
• Continuously evolving understanding of
capability / possibilities
• Next generation IT w/ rapidly evolving
ecosystem
• Self-service innovation for business users
Lessons Learned
Hadoop doesn’t remove hardware maintenance
Hadoop development is still development!
New paradigm – requires skilled developers
A whole new set of error messages to decode
There aren’t that many experts
Where do we go next?
Self-service tools
Selling Hadoop internally
• This journey has taught me a lot about Hadoop; more than most people at the organization
• The biggest tasks are educating the organization and doing simple things as a first step
Thank You

Beyond TCO

  • 1.
    2016-06-29 Beyond TCO Architecting Hadoopfor adoption and data applications Reid Levesque – Head, Solution Engineering
  • 2.
  • 3.
    Topics Technology Use casesDeployment Impact Next steps
  • 4.
  • 5.
    Every company isa technology company… some just don’t know it yet.
  • 6.
    Traditional systems underpressure Conventional wisdom • Put the code on an Application Server • Move the data to/from database • Move the data to/from NAS Reality check • This works well for small amounts of data • As data volumes increase this design falls apart
  • 7.
  • 8.
    How do weget Hadoop into the organization?
  • 9.
    How about theseuse cases?  File archive +Hadoop  Data-intensive grid compute analytics  Database replacement  ETL off-load +Hadoop +Hadoop +Hadoop •Data is online; no need for tape backup •Cheaper than NAS / SAN •Increased performance / scalability •Metadata is easier to get; all the data is in one spot •Improved performance •Lower TCO •Reduced dependence on proprietary software •Reduce RDBMS licensing •Reduced operational cost for analysis •Improved functionality with stored XML •Lower TCO •Additional analytic capability •Better hardware utilization •Lower platform management
  • 10.
    Not so much File archive +Hadoop  Data-intensive grid compute analytics  Database replacement  ETL off-load +Hadoop +Hadoop +Hadoop TCO
  • 11.
    Which use casedid work?  Current batch was taking 4 hours; which limited the way they did their job  Users wanted interactive response times to design and test their financial models  This was net new functionality that could only be achieved in Hadoop
  • 12.
    Now TCO makesmore sense  File archive +Hadoop  Data-intensive grid compute analytics  Database replacement  ETL off-load +Hadoop +Hadoop +Hadoop With Hadoop TCO covered, previous use cases are now more compelling.
  • 13.
    How do wedeploy this?
  • 14.
  • 15.
    Time to pickthe hardware Is this true?
  • 16.
    Commodity hardware +commodity networking = bad architecture
  • 17.
    Before there wasHadoop, there were enterprise IT standards To name a few conflicts during the rollout… • Local account UID / names • OS settings • Root access • File locations • Standard mount sizes • Enterprise Active Directory • Monitoring systems Hadoop is NOT flexible on deployment requirements
  • 18.
    Who does thework? Single team including: • Dedicated infrastructure team (Compute, Network, Data Center, Operations) • Dedicated Hadoop team (sysadmin/operations, engineering) • Hardware vendor engineers • Hadoop distribution engineers
  • 19.
  • 20.
    What was theimpact?
  • 21.
  • 22.
    Impact across theorganization Infrastructure • Networking / Data Center designs • Relationship with storage, cloud, virtualization capabilities • Generating analytic use cases Development • Mega-attractor for talent • Application consolidation • Shifting from IT to business focus Management • Understanding (or accepting) new paradigm • Cross-department architecture alignment • Data-focus rather than application-focus Business • Continuously evolving understanding of capability / possibilities • Next generation IT w/ rapidly evolving ecosystem • Self-service innovation for business users
  • 23.
    Lessons Learned Hadoop doesn’tremove hardware maintenance Hadoop development is still development! New paradigm – requires skilled developers A whole new set of error messages to decode There aren’t that many experts
  • 24.
    Where do wego next?
  • 25.
  • 26.
    Selling Hadoop internally •This journey has taught me a lot about Hadoop; more than most people at the organization • The biggest tasks are educating the organization and doing simple things as a first step
  • 27.

Editor's Notes

  • #2 Beyond Total Cost of Ownership
  • #3 I’m Reid Levesque. I work at RBC and this is the story of how we did the impossible: We convinced a Canadian Bank to deploy Hadoop.
  • #4 I want to tell you about my experiences getting Hadoop set up in a bank.
  • #5 I want to make sure that everyone is on the same page with why Hadoop is so awesome.
  • #6 Big Data was and is one of the hottest buzzwords.
  • #7 We had a lot of “traditional” applications.
  • #8 Hadoop is clearly the solution and I’m not going to bore you with how Hadoop works. As a techie it’s easier to find an appropriate technology than it is to get it embedded into an organization and change the culture.
  • #11 Doing the same thing with a different tool is never attractive enough. All these use cases focus on lower the Total Cost of Ownership. By the time you add up all the project cost of switching to Hadoop and setting up Hadoop from scratch, there are no cost savings left.
  • #12 In the end we got it down to 5 minutes
  • #15 At a bank, we are not in a position to run our own Hadoop cluster from open source components. We need help and that’s what the Hadoop distros do. Turns out they’re all about the same. Each has its own special sauce but it didn’t really matter for our use case. We picked Hortonworks and off we went.
  • #17 Remember when I said “Hadoop moves the code to the data”? Well that’s not entirely accurate. There is enough data movement between nodes in a Hadoop cluster that if not managed correctly can bring down your network. (Story about browning out cluster) For that reason we opted for fairly commodity hardware but not commodity networking.
  • #18 Current: Specific names for accounts RAID 5 Specific folder (e.g. /app) Weeks to get AD accounts   Guess what? Hadoop doesn’t fit into those standards very well. Hadoop has a very specific way of being set up or just won’t work. (Story about RAID 0 on all disks)
  • #19 Traditional teams would be set up with developers in one team, operations in another, and infrastructure in yet another. This doesn’t work for Hadoop; too many moving parts. We had all of those as well as vendor support in one team.
  • #22 During the initial rollout there were many naysayers and teams who were interested but didn’t want to bet the farm on it. For 2 weeks after the rollout, our success was the thing no one wanted to talk about. But then the flood gates opened and everyone was coming to get a piece of Big Data.
  • #24 The frameworks in Hadoop (MapReduce, Spark, Apex, etc.) take care of a lot of the hard bits like parallelization, elasticity, etc. However, we need still need to do development to get any new functionality.
  • #26 When you start digging into the day-to-day operations of the business users you see simple things that are difficult because of traditional tools. Things like looking at the contents of a 2GB csv file can be impossible. This has led us to use BI tools on top of Hadoop to let the users get value out of their data without IT support: Arcadia, Datameer are great examples. Once the users can see their data they come up with great use cases and they really start getting value out of the platform.
  • #27 It ends up that the first use case is quite simple: get data into Hadoop and do something like query across many datasets. It turns out that this is so revolutionary to so many business users that they soon want to do more work on Hadoop. Talk about use cases