Michael newberry


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • A personal view from where I sit.Picking examples unlikely to be used by other speakers.Hype – freeform dynamics on the register: http://www.freeformdynamics.com/fullarticle.asp?aid=1590
  • Big DataThis is a picture down the center isle of a shipping container from one of Microsoft’s datacenters. We put ~1800 computers inside one of these containers. Some of us had the privilege of working on the data storage and computational platform that powers Bing. We used 22 of these containers, spanning 40,000 machines where we stored over 100PB of data. This was three years ago, and now these servers are almost obsolete.Big Data is in constant motion and growing at an incredible rate,90% of the world’s data generated in just the past two years. That's remarkable growth.
  • Doug Laney http://blogs.gartner.com/doug-laney/files/2012/01/ad949-3D-Data-Management-Controlling-Data-Volume-Velocity-and-Variety.pdf
  • Don’t forget – other kinds of machine learningBridge into
  • Need for these tools motivated by data explosion –“Each mapper takes a line as input and breaks it into words. It then emits a key/value pair of the word and 1. Each reducer sums the counts for each word and emits a single key/value with the word and sum. As an optimization, the reducer is also used as a combiner on the map outputs. This reduces the amount of data sent across the network by combining each word into a single record. “
  • Future of query processingPioneered in the Jim Gray Systems Labs by David DeWitt, PolyBase is a federated query processor in SQL Server 2012 Parallel Data Warehouse which represents a breakthrough innovation from traditional query processing to join structured and unstructured data from Hadoop together. Without manual intervention, PolyBase Query Processor can accept a standard SQL query and combine tables from a relational source with tables from a Hadoop source directly through external tables.  As well, PolyBase Query Processor parallelizes the ability to import/export data to and from Hadoop giving PDW speed, simplicity, and responsiveness in addressing these new types of queries.Ability to issue standard T-SQL that joins relational data with unstructured data in Hadoop PolyBase rapidly imports/exports data between Hadoop and PDW in parallel3) PolyBase can query data in Hadoop directly without movement (with external tables)4) Created in “Gray Systems Labs” by David DeWitthttp://www.microsoft.com/en-us/sqlserver/solutions-technologies/data-warehousing/polybase.aspx
  • http://www.microsoft.com/casestudies/Case_Study_Detail.aspx?CaseStudyID=710000002102As the game was prepared for release, however, 343 Industries was faced with an entirely new kind of challenge: to gain insight into player behavior and user preferences. To achieve this goal, Microsoft leadership asked 343 Industries to find a way to effectively mine user data. At the same time, the team was faced with another need: analyzing data during the five-week Halo 4 “Infinity Challenge” tournament and providing results each day to their tournament partner, Virgin Gaming. The Halo 4 Infinity Challenge, the largest free-to-enter online Halo tournament in the world, tracked a player’s personal score in the game’s multiplayer modes across a global leaderboard, giving players a chance to win more than 2,800 prizes. Virgin Gaming needed to use business intelligence (BI) data gathered during the event to update leaderboards on the tournament website.“..the average length of a game and the specific game features that players use the most. By getting these insights, the Halo 4 team can make frequent updates to the game. “Based on the user preference data we’re getting from Hadoop, we’re able to update game maps and game modes on a week-to-week basis,” says Vayman. “And the suggestions we get in the forums often find their way into the next week’s update. We can actually use this feedback to make changes and see if we attract new players. Hadoop and the forums are great tuning mechanisms for us.”
  • Michael newberry

    1. 1. Extracting Valuefrom Big Data in the Cloud - Michael Newberry
    2. 2. Big data in a Hybrid-Cloud world Dr Michael Newberry Windows Azure Lead, Microsoft UK Michael.Newberry@Microsoft.com
    3. 3. Doggerland: Simon Fitch, Vince Gaffney and Ken ThomsonImage Source: drowned-landscapes.tumblr.comRoyal Societys Summer Science Blog (http://summer-science.tumblr.com/)
    4. 4. Big Data.
    5. 5. VOLUME VARIETY VELOCITY (Size) (Structure) (Speed) Big Data.
    6. 6. Getting useful insightsfrom awkward data setsusing the most appropriatecomputing platform at eachstage. Dr Michael Newberry Windows Azure Lead Microsoft UK
    7. 7. Big data in a Hybrid-Cloud world Dr Michael Newberry Windows Azure Lead, Microsoft UK Michael.Newberry@Microsoft.com
    8. 8. Machine Learning & Bayes theorem
    9. 9. ….Amazon (AMZN) calls this homegrown math "item-to-item collaborative filtering," and its used this algorithm to heavilycustomize the browsing experience for returning customers…. Judging by Amazons success, the recommendationsystem works. The company reported a 29% sales increase to $12.83 billion during its second fiscal quarter, up from$9.9 billion during the same time last year. A lot of that growth arguably has to do with the way Amazon has integratedrecommendations into nearly every part of the purchasing process from product discovery to checkout.http://tech.fortune.cnn.com/2012/07/30/amazon-5/
    10. 10. “In theory there is no difference between theory and practice; in practice, there is”. Yogi Berra, cited in Nassim Taleb, Antifragile.
    11. 11. Big data techniquesNoSQL (ala MongoDB) Map-Reduce (e.g. Hadoop)
    12. 12. Embedded devices
    13. 13. Cloud OS
    15. 15. MANAGE ANY DATA, ANY SIZE, ANYWHERE 010101010101010101 1010101010101010 01010101010101 101010101010
    16. 16. POLYBASE: COMBINING RELATIONAL AND NON-RELATIONAL DATAThe future of query processing    
    17. 17. 19
    18. 18. 20
    19. 19. Lock-InWindows Azure Other Service Providers Windows Virtual Machine Customer Data Center
    20. 20. DATA PLATFORM DELIVERY MODELSRationalefor Usage On-Premises On-Premises or Microsoft Cloud orLocation Service Provider Service Provider
    21. 21. BALANCING ON PREMISE & CLOUDSnowline graph
    22. 22. A
    23. 23. Takeaways1. “big data” can do some amazing stuff.2. Don’t think “big data” as much as “data needing non- relational approaches”3. If your big data insights are probabilistic, which they often are, have a plan to deal with variance.4. Pick the most appropriate platform: Think “and” not “or”: - Balance public cloud AND on-premise, - Combine “big data” with RDBMS.
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.