Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Webinar on Big Data Challenges : Presented by Raj Kasturi


Published on

Big data is huge! with billions and billions of data sets and a need to analyze and apply that to real-life problem-solving is a challenge. Are traditional methods successful in solving big data problems?

Let’s take a look at the current state of big data, if traditional methodologies are providing the necessary answers quick enough. Is Agile/Scrum a good fit for big data?

– big data in any industry
– high data availability, real time analytics, data warehousing
– agile spectrum and where do my projects fall?
– big data complexity and empirical process control theory
– current industry trends
– metrics

Published in: Engineering
  • Be the first to comment

  • Be the first to like this

Webinar on Big Data Challenges : Presented by Raj Kasturi

  1. 1. 1 Is Scrum a good fit for solving big data challenges? Speaker – Raj Kasturi September 19th, 2017 10:00 to 11:00 AM EST, 7:30 PM to 8:30 PM IST Special Thanks to:
  2. 2. 2 • 25+ years of IT experience with eight plus years of enterprise level Agile Experience • Agile experience as an Agile Coach, Scrum Trainer, Scrum Master • Leading and helping large-‐scale Agile project transitions • Adjunct faculty at Pennsylvania State University, Pennsylvania, USA. • 18+ years of teaching experience in Technology, Project Management and 8 years of teaching Scrum, Agile courses • Started my career as a programmer; worked as App. Dev. Manager • Speaker, volunteer at agile conferences, user groups • Servant Leader – Agile World, User Group, Scrum Alliance My Website/Blog: @AgileRaj Raj Kasturi, MBA
  3. 3. 3 Agenda What is big data? The three V’s of big data Big Data Trends of 2017 Agile Spectrum Big data complexity and empirical process control theory Scrum and Big Data Summary
  4. 4. 4 What is big data? ▪ The term big data was coined in late 1990s ▪ Big data is different than regular data ▪ Billions of data sets and their interaction ▪ Traditional RDBMS is for regular data ▪ RDBMS cannot handle big data ▪ Requires a new technological approach for handling and processing ▪ New data platforms to meet storage and performance requirements
  5. 5. 5 The 3 V’s of big data Volume VarietyVelocity Are these three factors required to drive the need?
  6. 6. 6 Add Value ▪ Do we have a fourth V? ▪ Aggregate to provide value Value
  7. 7. 7 Google’s flu tracker ▪ Knowing the what, rather than the why was good enough ▪ 2009 H1N1 flu epidemic ▪ Real-time flu tracker “Google Flu Trends” ▪ Flu sufferers google before visiting a clinic ▪ Search queries optimized, accurate and real-time data ▪ Data was far more effective than CDC – Size ▪ 3 billion searches a day ▪ Large servers and clever algorithms to sort data
  8. 8. 8 Who uses it? ▪ Financial Services ▪ Telecommunications ▪ Energy ▪ Government ▪ Retail ▪ And many more…..
  9. 9. 9 Complexity Big Data
  10. 10. 10 Agile Spectrum
  11. 11. 11 Input Output May have internal processes Process
  12. 12. 12 Input Output May have internal processes Defined Process Composition known Characteristics well defined • Sequential/Series of steps • Underlying process well understood • Results repeatable/predictable • Command & Control approach • Pre-defined variations are acceptable
  13. 13. 13 Empiricism
  14. 14. 14 14 Transparency Adaptation Inspection Black Box Frequently Inspect and remove any unacceptable variations Adjust and control the process, Improve Significant aspects of the process must be visible to those responsible for the outcome Inputs Outputs Needs frequent measurement Problem cannot be fully understood or defined Solution evolves as information becomes known Protect the black box by not adding anything new!
  15. 15. 15 15
  16. 16. 16 16
  17. 17. 17 Hadoop’s distributed file system (HDFS) Source: Managing big data workflows for dummies MapReduce - think of it as a framework that processes and reduces raw big data into regular‐size, tagged datasets that are much easier to work with.
  18. 18. 18 Popular platforms and tools ➢ Pig ➢ Apache Hive ➢ Apache Sqoop ➢ In-memory databases ➢ NoSQL databases ➢ Massively Parallel Processing (MPP) ➢ Cassandra ➢ Hadoop ➢ Plotly ➢ Bokeh ➢ Neo4j ➢ Cloudera ➢ OpenRefine ➢ Storm
  19. 19. 19 Scrum and Big Data ➢Scrum’s ability to measure work output – Velocity ➢Knowledge is based on the ability to measure a given phenomenon ➢Once we measure it, we can start to manipulate the input and determine if we’ve improved something by the resulting output. Inspect & Adapt concept ➢-we have discussed empiricism and Scrum is based on empirical process control ➢Continuous improvement
  20. 20. 20 Top 10 Big Data Trends 2017 1. Big data becomes fast and approachable: Options expand to speed up Hadoop 2. Big data no longer just Hadoop: Purpose-built tools for Hadoop become obsolete 3. Organizations leverage data lakes from the get-go to drive value 4. Architectures mature to reject one-size-fits all frameworks 5. Variety, not volume or velocity, drives big-data investments
  21. 21. 21 Top 10 Big Data Trends 2017 6. Spark and machine learning light up big data 7. The convergence of IoT, cloud, and big data create new opportunities for self-service analytics 8. Self-service data prep becomes mainstream as end users begin to shape big data 9. Big data grows up: Hadoop adds to enterprise standards 10. Rise of metadata catalogs helps people find analysis- worthy big data
  22. 22. 22 Summary Scrum is good for work: With a fair degree of complexity, Requires innovation Requires invention Product differentiation Productivity Faster launch to market I say that Big Data needs all of the above.
  23. 23. 23 Attributions 1. Scrum Guide 2016 2. JJ Sutherland 3. Managing Big Data Workflows for dummies – BMC Software special edition- Joe Goldberg and Lillian Pierson, PE 4. Top 10 Big Data Trends for 2017 Tableau