Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

1

Share

Download to read offline

"Big data & frameworks: no book for you anymore" Роман Никитченко

Download to read offline

When your clients need only small database for personal music library and some kind of HTTP interface to it, everything looks nice and you can use lot of bright frameworks and trusted approaches for your application.
But what changes if you step ahead of existing solutions to bring things like population health management?
Let's talk about our Big Data experience and meaninful framework usage:
What makes the difference when you go Big Data and Hadoop.
Frameworks and big data: hamsters vs hipsters.
Reality matters. Frameworks cost. How much?
What framework is good for you?
Making your own frameworks.

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

"Big data & frameworks: no book for you anymore" Роман Никитченко

  1. 1. Roman Nikitchenko, 22.02.2015 SUBJECTIVE BIG DATA NO BOOK FOR YOU ANYMORE FRAMEWORKS
  2. 2. 2frameworksdays.com WHAT WE WANT CHEAPER No bike reinventions anymore FASTER time to marked — part of job is done BETTER Quality of proven approaches FRAMEWORKS
  3. 3. 3frameworksdays.com WHAT WE GET FRAMEWORKS OFTEN
  4. 4. 4frameworksdays.com CAN CHIMPS DO BIG DATA? Real shocking title book available for pre-order. This is exactly what happens now in Big Data industry. Roses are red. Violets are blue. We do Hadoop What about YOU?
  5. 5. 5frameworksdays.com SCALE BIG DATA IS ABOUT... GET CHMIPS OUT OF DATACENTER
  6. 6. 6frameworksdays.com BIG DATA SO HOW TO DO FRAMEWORKING... WHEN YOU DO
  7. 7. 7frameworksdays.com YARN we do Big Data with Hadoop
  8. 8. 8frameworksdays.com FRAMEWORK Is an essential supporting structure of a building, vehicle, or object. In computer programming, a software framework is an abstraction in which software providing generic functionality can be selectively changed by additional user-written code, thus providing application-specific software.
  9. 9. 9frameworksdays.com FRAMEWORKS DICTATE APPROACH Frameworks are to lower amount of job by reusing. The more you can reuse the better. But complex framework are too massive to be flexible. They limit your solutions. Doing Big Data you usually build unique solution.
  10. 10. 10frameworksdays.com SO DO I NEED UNIQUE FRAMEWORKS FOR EVERY BIG DATA PROJECT?
  11. 11. 11frameworksdays.com x MAX + = BIG DATA BIG DATA BIG DATA HADOOP as INFRASTRUCTURE
  12. 12. 12frameworksdays.com LOOKS LIKE THIS
  13. 13. 13frameworksdays.com OPEN SOURCE framework for big data. Both distributed storage and processing. Provides RELIABILITY and fault tolerance by SOFTWARE design. Example — File system as replication factor 3 as default one.Horisontal scalability from single computer up to thousands of nodes. INFRASTRUCTURE 3 SIMPLE HADOOP PRINCIPLES
  14. 14. 14frameworksdays.com HADOOP INFRASTRUCTURE AS A FRAMEWORK ● Is formed from large number of unified nodes. ● Nodes are replaceable. ● Simple hardware without sophisticated I/O. ● Reliability by software. ● Horizontal scalability.
  15. 15. 15frameworksdays.com FRAMEWORKS INFRASTRUCTURE APPROACH COMPLEXITY LIMITATIONS OVERHEAD
  16. 16. 16frameworksdays.com How everyone (who usually sells something) depicts Hadoop complexity GREAT BIG INFRASTRUCTURE AROUND SMALL CUTE CORE YOUR APPLICATION SAFE and FRIENDLY
  17. 17. 17frameworksdays.com How it looks from the real user point of view Feeling of something wrong CORE HADOOP COMPLETELY UNKNOWN INFRASTRUCTURE SOMETHINGYOU UNDERSTAND YOUR APPLICATION FEAR OF
  18. 18. 18frameworksdays.com But... imagine we have BIG DATA bricks. How should they look like?
  19. 19. 19frameworksdays.com WHAT BRICKS SHOULD WE TAKE TO BUILD BIG DATA SOLUTION? ● We should build unique solutions using the same approaches. ● So bricks are to be flexible.
  20. 20. 20frameworksdays.com WHAT BRICKS SHOULD WE TAKE TO BUILD BIG DATA SOLUTION? ● We should build robust solution with high reliability. ● Bricks are to be simple and replacable.
  21. 21. 21frameworksdays.com WHAT BRICKS SHOULD WE TAKE TO BUILD BIG DATA SOLUTION? ● We should be able to change our solution over the time. ● Bricks are to be small.
  22. 22. 22frameworksdays.com WHAT BRICKS SHOULD WE TAKE TO BUILD BIG DATA SOLUTION? ● As flexible as it is possible. ● Focused on specific aspect without large infrastructure required. ● Simple and interchangable.
  23. 23. 23frameworksdays.com HADOOP 2.x CORE AS A FRAMEWORK BASIC BLOCKS ● ZooKeeeper as coordinational service. ● HDFS as file system layer. ● YARN as resource management. ● MapReduce as basic distributed processing option.
  24. 24. 24frameworksdays.com HADOOP HAS LAYERS RESOURCE MANAGEMENT DISTRIBUTED PROCESSING FILE SYSTEM COORDINATION HADOOP 2.x CORE
  25. 25. 25frameworksdays.com PACKAGING ... RUBIK's CUBE STYLE ● Hadoop packaging is non-trivial task. ● It gets more complex when you add Apache Spark, SOLR or Hbase indexer.
  26. 26. 26frameworksdays.com Hadoop: don't do it yourself REUSE AS IS ● BASIC infrastructure is pretty reusable to build with it. At least unless you know it well. ● Do you have manpower to re-implement it? You'd beeeter contribute in this case.
  27. 27. 27frameworksdays.com WHERE TO GO FROM HERE?
  28. 28. 28frameworksdays.com HERE PEOPLE START TO ADD EVERY FRAMEWORK THEY KNOW ABOUT...
  29. 29. 29frameworksdays.com YARNAT LEAST WE DO IT ONE BY ONE
  30. 30. 30frameworksdays.com WHAT DO WE USUALLY EXPECT FROM NEW FRAMEWORK? BETTER CHEAPER FASTER frameworks provide higher layer of abstraction so coding go faster some part of work is already done top framework contributors are usually top engineers
  31. 31. 31frameworksdays.com OOOPS... BETTER CHEAPER FASTER frameworks provide higher layer of abstraction so coding go faster some part of work is already done top framework contributors are usually top engineersAdditional cost of new framework maintenance Additional time of learning new approach Lot of defects due to lack of experience with new framework
  32. 32. 32frameworksdays.com BETTER CHEAPER FASTER frameworks provide higher layer of abstraction so coding go faster some part of work is already done top framework contributors are usually top engineersAdditional cost of new framework maintenance Additional time of learning new approach Lot of defects due to lack of experience with new framework NONEXISTENT ONLY TWO?
  33. 33. 33frameworksdays.com JUST FEW EXAMPLES ● Spring batch — main thread who started spring context forgot to check task accomplishment status. ● Apache Spark — persistence to disk was limited to 2GB due to ByteBuffer int limitation. ● Apaceh Hbase has by now no effective guard against client RPC timeout. ● What about binary data like hashes? No effective out-of-the-box support by now. ONLY REAL EXPERIENCE NEW FRAMEWORKS ARE ALWAYS HEADACHE
  34. 34. 34frameworksdays.com %^#@#^&@#&#%@ !!!
  35. 35. 35frameworksdays.com JUST LONGER PERSPECTIVE? When you use the same approach for a long time you do it more and more effective.
  36. 36. 36frameworksdays.com JAVA MESSAGE SERVICE APACHE SPARK 1.0.2b (June 25, 2001) 1.1 (April 12, 2002) 2.0 (May 21, 2013) 0.9.0 (Feb 2, 2014) 1.0 (May 30, 2014) 1.1 (Sep 11, 2014) 1.2 (Dec 18, 2014) JUST FEEL SPEED DIFFERENCE BUT
  37. 37. 37frameworksdays.com FULL DATA PROCESSING PLATFORM SUPPORTING YARN
  38. 38. 38frameworksdays.com SO BIG DATA TECHNOLOGY BOOKS ARE ALWAYS OUTDATED Great books but when they are printed they are already old. Read original E-books with updates.
  39. 39. 39frameworksdays.com DO NOT HIDE YOUR EXPERIENCE
  40. 40. 40frameworksdays.com FRAMEWORKS IN BIG DATA HAMSTERS vs HIPSTERS We hate frameworks! Only hardcore, only JDK! Give me framework for every step!
  41. 41. 41frameworksdays.com FRAMEWORKS IN BIG DATA HAMSTERS vs HIPSTERS Significant overhead even comparing to MapReduce access Most simple way to access your Hbase data for analytics. Apache Hbase is top OLTP solution for Hadoop. Hive can provide SQL connector to it. Hbase direct RPC for OLTP, MapReduce or Spark when you need performance and Hive when you need faster implementation. Crazy idea: Hive running over Hbase table snapshots.
  42. 42. 42frameworksdays.com FAST FEATURE DEVELOPMENT ACTIVE COMMUNITY STABLE REUSABLE ARCHITECTURE OUR BIG DATA FRAMEWORKS CRITERIA
  43. 43. 43frameworksdays.com ETL: FRAMEWORKS COST ● We do object transformations when we do ETL from SQL to NoSQL objects. ● Practically any ORM framework eats at least 10% of CPU resource. ● Is it small or big amount? Depends who pays... SQL server JOIN Table1 Table2 Table3 Table4 BIG DATA shard BIG DATA shard BIG DATA shardETL stream ETL stream ETL stream ETL stream
  44. 44. 44frameworksdays.com 10% overhead... ● Single desktop application - computers usually have unused CPU power. 10% overhead is not so notable for user so user accepts it. ● User pays for electricity and hardware.
  45. 45. 45frameworksdays.com ● Lot of mobile clients. Can tolerate 10% performance degradation. Application still works. ● All users pay for your 10% performance overhead. 10% overhead...
  46. 46. 46frameworksdays.com ● Single server solution. OK, usually you have 10% spare. ● So you pay for overhead but you don't notice it before it is needed. You have the same 1 server. 10% overhead...
  47. 47. 47frameworksdays.com ● 10% overhead of 1000 servers with properly distributed job means up to 100 servers additionaly needed. ● This is your direct maintenance costs. 10% overhead... IN CLUSTERS YOU DIRECTLY PAY FOR OVERHEAD WITH ADDITIONAL CLUSTER NODES.
  48. 48. 48frameworksdays.com WHAT FRAMEWORK IS REALLY GOOD FOR YOU? ● If you know amount (and cost) of job to replace framework, this is really good for you.
  49. 49. 49frameworksdays.com MAKING YOUR OWN FRAMEWORK ● Most common reason for your own framework is … growing complexity and support cost. ● New framework development and migration can be cheeper than support of existing solutions. ● You don't want to depend on existing framework development.
  50. 50. 50frameworksdays.com MAKING FRAMEWORK LAZY STYLE ● First do multiple solutions than integrate them into single approach. ● GOOD You only integrate what is already used so less unused work. ● BAD Your act reactive.
  51. 51. 51frameworksdays.com MAKING FRAMEWORK PROACTIVE STYLE ● You improve framework before actual need. ● GOOD You are guided by approach, not need, so usually you have more clear design. ● BAD Your have more probability to do not needed things.
  52. 52. 52frameworksdays.com OUTSIDE YOUR TEAM ● Great, you have additional workforce. But from now you have external support tickets. ● Usually you can control your users so major changes are yet possible but harder. ● Pay more attention to documentation and trainings for other teams. It pays back.
  53. 53. 53frameworksdays.com OUTSIDE YOUR COMPANY ● You receive additional workforce. People start contributing into your framwork. Don't be so optimistic. ● Community support is good but you need to support community applications. ● You are no longer flexible. You don't control users of your framework.
  54. 54. 54frameworksdays.com LESSONS LEARNED CORE ● Avoid inventing unique approach for every Big Data solution. It is critical to have good relatively stable ground. ● Your Big Data CORE architecture is to be layered infrastructure constructed from small, simple, unified, replaceable components (UNIX way). ● Be ready for packaging issues but try to reuse as maximum as possible on CORE layer.
  55. 55. 55frameworksdays.com LESSONS LEARNED ● Selecting frameworks to extend your big data core prefer solutions with stable approach, flexible functionality and healthy community. Revise your approaches as world changes fast. ● Prefer to contribute to good existing solution rather than start your own. ● The more frequent you change something, the more higher layer tool you need for this. But in big data you directly pay for any performance overhead. ● If you have started your own framework, the more popular it is, the fewer freedom to modify you have so the only flexibility is bad reason to start. BEYOND THE CORE
  56. 56. 56frameworksdays.com Questions and discussion
  • serg_illich

    Apr. 20, 2015

When your clients need only small database for personal music library and some kind of HTTP interface to it, everything looks nice and you can use lot of bright frameworks and trusted approaches for your application. But what changes if you step ahead of existing solutions to bring things like population health management? Let's talk about our Big Data experience and meaninful framework usage: What makes the difference when you go Big Data and Hadoop. Frameworks and big data: hamsters vs hipsters. Reality matters. Frameworks cost. How much? What framework is good for you? Making your own frameworks.

Views

Total views

750

On Slideshare

0

From embeds

0

Number of embeds

323

Actions

Downloads

21

Shares

0

Comments

0

Likes

1

×