The dawn of big data


Published on

Big Data basics

Published in: Business, Technology
1 Comment
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Organizations everywhere now realize that there is immense insight and value locked inside of the data, and new infrastructure and approaches to data analysis allow us to unlock that value
  • This is what’s happened in the last four decades.
  • These four factors also happen to be inputs for data generation processes.
  • Sizes that were unimaginable a few years ago are now commonplaceJust storing and accessing the data can be difficultSIZE – MANAGED WITH – STOREDSmall :: Excel, R :: fits in memory on one machineMedium :: indexed files, monolithic DB :: fits on disk on one machineBig :: Hadoop, Distributed DB :: stored across many machinesGenerally - data too big to fit on a disk :: ‘data-center’ scale
  • Data that is difficult for computers to understand Principal example being natural langauagetext, Images, Video and moreValuable info locked up inside this data (e.g. twitter)
  • More data coming in fasterDecision windows getting smallerValuable to worthless in a matter of minutes. (seconds … no milliseconds)
  • Source: Architecture for Big Data Analytics: MarkLogic white paper
  • The dawn of big data

    1. 1. THE DAWN OF BIG DATANew Rules; New Structures Neal J. Hannon University of Kansas February 9, 2012
    2. 2. Data Mania• n4&feature=player_embedded
    3. 3. Definition• Big data is a term applied to data sets whose size is beyond the ability of commonly used software tools to capture, manage, and process the data within a tolerable elapsed time. Big data sizes are a constantly moving target currently ranging from a few dozen terabytes to many petabytes of data in a single data set.
    4. 4. More Data Please…
    5. 5. • In a 2001 research report[14] and related conference presentations, then META Group (now Gartner) analyst, Doug Laney, defined data growth challenges (and opportunities) as being three-dimensional, i.e. increasing volume (amount of data), velocity (speed of data in/out), and variety (range of data types, sources). Gartner continues to use this model for describing big data.[15]
    6. 6. Gartner• Worldwide information volume is growing annually at a minimum rate of 59 percent annually, and while volume is a significant challenge in managing big data, business and IT leaders must focus on information volume, variety and velocity. • Volume • Variety • Velocity
    7. 7. Volume• Volume: The increase in data volumes within enterprise systems is caused by transaction volumes and other traditional data types, as well as by new types of data. Too much volume is a storage issue, but too much data is also a massive analysis issue.
    8. 8. Variety• Variety: IT leaders have always had an issue translating large volumes of transactional information into decisions — now there are more types of information to analyze — mainly coming from social media and mobile (context- aware). Variety includes tabular data (databases), hierarchical data, documents, e- mail, metering data, video, still images, audio, stock ticker data, financial transactions and more.
    9. 9. Velocity• Velocity: This involves streams of data, structured record creation, and availability for access and delivery. Velocity means both how fast data is being produced and how fast the data must be processed to meet demand.
    10. 10. Why now?There were 5 exabytes of information created between the dawn ofcivilization through 2003, but that much information is now created every 2days, and the pace is increasing Eric Schmidt, Google CEO, Techonomy Conference, August 4, 2010Data is becoming the new raw material of business: an economic inputalmost on a par with capital and labour. “Every day I wake up and ask, ‘howcan I flow data better, manage data better, analyse data better?” says RollinFord, the CIO of Wal-Mart. Source: Data, Data Everywhere, The Economist, February 25, 2010
    11. 11. Source: Mike Driscoll, CTO Metamarkets: The Three Sexy Skills of Data Scientists (& Data Driven Startups)
    12. 12. Source: Mike Driscoll, CTO Metamarkets: The Three Sexy Skills of Data Scientists (& Data Driven Startups)
    13. 13. How can big data create value?• Creating transparency – enabling, for example, the manufacturing sector to integrate ―data from R&D, engineering, and manufacturing units to enable concurrent engineering ... (to) significantly cut time to market and improve quality.‖ This seems much like traditional data warehousing.
    14. 14. How can Big Data create value?• Enabling experimentation – ―organizations can collect more accurate and detailed performance data ... to instrument processes and then set up controlled experiments … (which) can enable leaders to manage performance at higher levels.‖ Super-crunching equals analytics + experiments.
    15. 15. How can Big Data create value?• Innovating new business models – ―The emergence of real-time location data has created an entirely new set of location-based services from navigation to pricing property and casualty insurance based on where, and how, people drive their cars.‖ This affirms Mike Loukides assertion ―that data science enables the creation of data products.‖
    16. 16. How can Big Data create value?• Supporting human decision making with automated algorithms – ―decision making may never be the same; some organizations are already making better decisions by analyzing entire datasets from customers, employees, or even sensors embedded in products.‖ The statistical learning world continues to progress.
    17. 17. SAS - unstructured text• NHAq8jG4FX4&feature=pyv&ad=8557352196& kw=data%20analytics
    18. 18. Pattern Based Strategy• "The ability to manage extreme data will be a core competency of enterprises that are increasingly using new forms of information — such as text, social and context — to look for patterns that support business decisions in what we call Pattern-Based Strategy," said Yvonne Genovese, vice president and distinguished analyst at Gartner. "Pattern-Based Strategy, as an engine of change, utilizes all the dimensions in its pattern-seeking process. It then provides the basis of the modeling for new business solutions, which allows the business to adapt. The seek-model- and-adapt cycle can then be completed in various mediums, such as social computing analysis or context-aware computing engines."
    19. 19. Pattern Based Strategy• g&feature=BFa&list=UUSNX50LYGXWV_e5U WZGPGbw&lf=plpp_video
    20. 20. EMC’s Big Data Video•• O’Reilly’s Take• 0&feature=related
    21. 21. Tricks of the Trade• New Architecture• In Memory Analytics
    22. 22. In-Memory Indexing at SAP• We have also got enterprise search time, we really started doing that back in 2003/2004 time period, that’s also when we started coming out with business warehouse accelerator that was when Google was just really starting to become Google, and we tried to do the same thing with enterprise data that Google does with website data as far as indexing it. So we also put the indexes in memory, so its speeded up even further and you know now if you actually look at HANA really is kind of the next evolutionary step in that that chain. This is in-memory process and this isn’t something just for a specialist. It really is a technology that’s matured to a level that it can run the entire business suite and run your entire company in-memory and get all those benefits for everything.• %20Chapter%20of%20In-Memory%20Computing_PT_12.22.11.pdf
    23. 23. For more on HADOOP• architecture
    24. 24. Obligatory Questions slide• Any Questions?