Your SlideShare is downloading. ×
Strata Conference NYC 2013
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Strata Conference NYC 2013


Published on

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. Taewook Eom Data Infrastructure Team SK planet 2014-01-28
  • 2. Taewook Eom Data Programmer Plaster(Planet Master) of Big Data Infra Pre-Assessor of Hiring Programmers Mentor of 101 Startup Korea Twitter: @taewooke LinkedIn:
  • 3. Santa Clara : Technical New York with Cloudera : Financial, Business Europe : Privacy, Government Boston : Medical by O’Reilly Web 2.0 : Open, Sharing, Participation Big Data : Making Data Work Change the World with Data.
  • 4. Data When hardware became commoditized, software was valuable. Now software being commoditized, data is valuable. – Tim O’Reilly, 2011 Data is like the blood of the enterprise. – Amr Awadallah, CTO at Cloudera, 2013
  • 5. What is Big Data? All data that is not a fit for a traditional RDBMS, whether used for OLTP or Analytics purposes Big Data Architectural Patterns
  • 6. Solving 'Big Data' Challenge Involves More Than Just Managing Volumes of Data - Gartner, 2011
  • 7.
  • 8. Defining your Big Data Arsenal: NoSQL, Hadoop, and RDBMS
  • 9. Data Science
  • 10. Big Data Open Mind!
  • 11. Big Data Gartner's 2013 Hype Cycle for Emerging Technologies (2013-08-19)
  • 12. more than half of technical sessions are presented by Chinese or Indian 39 of 125 sessions are sponsored sessions
  • 13. Big Data: 4 Approaches Hadoop-based RDB-based Search-based NoSQL
  • 14. Real-time Processing Real-time Recommendations for Retail: Architecture, Algorithms, and Design
  • 15. Real-time Stream Processing Apache Kafka Gathering Apache Storm Processing Querying Streaming Search-based NoSQL SQL Stringer/Tez Shark
  • 16. … not yet Graph Processing
  • 17. Big Data Space No one tools is the right fit for all Big Data problem Do not be afraid to recommend the right solution for the problem over the popular solution To do this, you must be aware of the entire ecosystem Big Data Architectural Patterns
  • 18. Practical Performance Analysis and Tuning for Cloudera Impala
  • 19. Hadoop and the Relational Data Warehouse – When to Use Which?
  • 20. Defining your Big Data Arsenal: NoSQL, Hadoop, and RDBMS
  • 21. Ignite Signal Detection Theory: Man vs Machine Co-Founder @VividCortex Kyle Redinger (5 minutes 6 seconds)
  • 22. Signal Detection Theory: Man vs Machine Remove the obvious and look at what is important Remember: Less is more.
  • 23. Keynote Towards Strata 2014 Director of market research at O’Reilly Media Roger Magoulas (5 minutes 26 seconds)
  • 24. Towards Strata 2014
  • 25. Towards Strata 2014
  • 26. Towards Strata 2014
  • 27. Towards Strata 2014
  • 28. Science is fundamentally about data, but data is not fundamentally about science Beyond R and Ph.D.s: The Mythology of Data Science Debunked Douglas Merrill (ZestFinance) (8 minutes 9 seconds)
  • 29. People A data scientist is a data analyst who lives in California. – George Roumeliotis, (Intuit)
  • 30.
  • 31. Data Data Data Data Businessperson: Business person, Leader, Entrepreneur Creative: Artist, Jack-of-All-Trades, Hacker Researcher: Scientist, Researcher, Statistician Engineer: Engineer, Developer
  • 32. Scientists think they can code, software engineers think they are scientists. Team them up so they collaborate. – Scott Sorenson ( Managing Big Data Reaching Back to the 11th Century with Hadoop
  • 33. How Nordstrom Utilizes Human Intelligence to Blend Brick-and-Mortar with Online Commerce
  • 34. Data scientists spend their lives as data janitors instead of leveraging their skills – Wes McKinney (DataPad) Building More Productive Data Science and Analytics Workflows
  • 35. Keynote Is Bigger Really Better? Predictive Analytics with Fine-grained Behavior Data Professor at the NYU Stern School of Business Foster Provost (10 minutes 16 seconds)
  • 36. Is Bigger Really Better? Predictive Analytics with Fine-grained Behavior Data
  • 37. Is Bigger Really Better? Predictive Analytics with Fine-grained Behavior Data
  • 38. Is Bigger Really Better? Predictive Analytics with Fine-grained Behavior Data Predictive does not mean actionable. – Scott Sorenson ( Managing Big Data Reaching Back to the 11th Century with Hadoop
  • 39. More data gives you more precision, not more prediction. Using multiple datasets to reduce errors when measuring values. Is Bigger Really Better? - Ravi Iyer ( Predictive Analytics with Fine-grained Understand yourData Users, and Employees Behavior Customers, Using Graphs of Data to
  • 40. Is Bigger Really Better? Predictive Analytics with Fine-grained Behavior Data
  • 41. Is Bigger Really Better? Predictive Analytics with Fine-grained Behavior Data
  • 42. Keynote Big Impact from Big Data Head of Analytics at Facebook Ken Rudin (11 minutes 57 seconds)
  • 43. Big Impact from Big Data
  • 44. Hadoop is a hammer, but you need other tools along with it. Designing Your Data-Centric Organization Josh Klahr (Pivotal) (12 minutes)
  • 45. Big Impact from Big Data The way you organize information depends on the question you intend to ask of it. - Richard Saul Wurman Building a Data Platform
  • 46. HaDump : Loading data into Hadoop for not reason. Data Science Without a Scientist
  • 47. Big Impact from Big Data Technical people still don't understand the business needs of business people! Business people don't know what's a table. - Anurag Tandon (MicroStrategy) Inject Big Data into your Corporate DNA: Enable Every Employee to Make Data Driven Decisions
  • 48. Ask the Right Questions Organizations already have people who know their own data better than mystical data scientists. Learning Hadoop is easier than learning the company’s business. - Gartner, 2012 Defining your Big Data Arsenal: NoSQL, Hadoop, and RDBMS
  • 49. Non-linear Storytelling: Towards New Methods and Aesthetics for Data Narrative
  • 50. Every Soldier is a Sensor: Countering Corruption in Afghanistan
  • 51. Big Impact from Big Data
  • 52. Big Impact from Big Data
  • 53. Big Impact from Big Data
  • 54. Value of Data Usable < Useful < Actionable with Impact If you can't answer for "so what?", you only have facts, not insight - Baron Schwartz (VividCortex Inc) Making Big Data Small Descriptive (Easy) Predictive (Medium) Prescriptive (Hard) What happened? What will happen? What should we do about it? Hadoop & Data Science for the Enterprise
  • 55. The Future of Hadoop : What Happened & What's Possible? Co-Founder of Hadoop Doug Cutting (14 minutes 41 seconds) schedule/detail/31591 Big Data is first industry that was created by open source. - Jack Norris (MapR Technologies) Separating Hadoop Myths from Reality Hadoop the kernel of the OS for data.
  • 56. Hadoop's Impact on the Future of Data Management Mike Olson (Cloudera)
  • 57. Single : : : : : : S/W & H/W system security model management model metadata model audit model resource management model Common : storage & schema
  • 58. Last generation of data management is not sufficient More copies, representations, transformations increase risk Index once and reuse across workloads, lifecycle NoSQL: indexing and updates for interactive apps Hadoop: staging, persistence, and analytics Data Governance for Regulated Industries Using Hadoop
  • 59. Data Intelligence Rethink How You See Data Sharmila Shahani-Mulligan (ClearStory Data) (9 minutes 6 seconds)
  • 60. The Data Availability Problem ? Access Question Sampling Analysis & Disc Modeling overy Loading Insight Data Prep – too slow! Information Supply Chain Introducing a New Way to Interact with Insight Presentation
  • 61. Running Non-MapReduce Big Data applications on Apache Hadoop
  • 62. Apache HBase for Architects What’s Next for Apache HBase: Multi-tenancy, Predictability, and Extensions.
  • 63. Securing the Apache Hadoop Ecosystem
  • 64. An Introduction to the Berkeley Data Analytics Stack With Spark, Spark Streaming, Shark, Tachyon, and BlinkDB
  • 65. Schema Information does not exist until a schema is defined and data is stored in a relational database - anonymous Building a Data Platform
  • 66. Lessons Learned From A Decade’s Worth of Big Data At The U.S. National Security Agency (NSA)
  • 67. Managing a Rapidly Evolving Analytics Pipeline
  • 68. Stringer/Tez Shark SQL on/in Hadoop/Hbase Solutions Perception is Key: Telescopes, Microscopes and Data
  • 69. All SQL on Hadoop Solutions are Missing the Point of Hadoop Every Solution makes you define a schema - SQL(Structured Query Language) is expressed over an assumed schema Major reasons why Hadoop has taken of include: - Ability to load data without defining a schema - Process data using schema-on-read instead of first defining a schema Hadoop contains a lot of: - Raw, granular data sets with potentially inconsistent schemas - Data sets in JSON, key-value, and other self-describing (non-relational) models designed for schema-on-read processing SQL on Hadoop solutions that make you first define a schema are missing a major part of Hadoop’s usage patterns Flexible Schema and the End of ETL
  • 70. Lessons Learned
  • 71. Hadoop Adventures At Spotify
  • 72. Hadoop Adventures At Spotify
  • 73. Quick prototyping is the fastest way to internal advocacy. Ship It! Cloud == Speed We don’t always need a complicated solution. KISS Play to your differentiating strengths. Experience >> Data Bias towards impact. It Takes a Village EASE!! (Emulate, Analyze, Scale, Evaluate) How Nordstrom Utilizes Human Intelligence to Blend Brick-and-Mortar with Online Commerce Prototyping is key to overcoming resistance to change Technical architecture is heavily influenced by people organization Developing a team of experienced Hadoop users can often be done using internal employees A culture of experimentation and innovation yields the best result Managing Big Data Reaching Back to the 11th Century with Hadoop
  • 74. Questions? SELECT questions FROM audience;
  • 75. References Strata Conference + Hadoop World 2013 Keynotes & Interviews Slides & Video Tweets #strataconf