• Save
Big Data Visualisation with Hadoop and PowerPivot
Upcoming SlideShare
Loading in...5
×
 

Big Data Visualisation with Hadoop and PowerPivot

on

  • 8,174 views

The slides don't cover the demo session, but this is an introduction to microsoft and big data which people might find interesting.

The slides don't cover the demo session, but this is an introduction to microsoft and big data which people might find interesting.

Statistics

Views

Total Views
8,174
Views on SlideShare
7,478
Embed Views
696

Actions

Likes
9
Downloads
0
Comments
1

18 Embeds 696

http://www.jenstirrup.com 513
http://app.sprinklr.com 73
http://tweetedtimes.com 33
https://twitter.com 31
http://cloud.feedly.com 18
http://www.feedspot.com 4
http://www.linkedin.com 3
http://www.twylah.com 3
http://jenstirrup.wordpress.com 3
http://sqlhugss.wordpress.com 3
http://www.goread.io 2
http://plus.url.google.com 2
http://www.feedly.com 2
http://elearning.cecot.es 2
http://www.newsblur.com 1
http://414079309540634094_f41bfd7e1b2331b3c89bd7c1274308bb5aa37bd3.blogspot.com 1
http://feedproxy.google.com 1
http://digg.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

11 of 1

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • great to see the slides related to Big Data Analytics ! You people can also refer this link:
    http://www.venturesity.com/course/big-data-analytics-online-course-using-r-hadoop/
    It would be beneficial for you guys.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Courtesy of Bruno Aziza at @SiSense
  • Relational databases are pushed to the limit.Data Management techniques haven't scaledTraditional systems haven't scaledBig data is about complexity as well as scalability.NoSQL as a paradigm shift.Hadoop can run and parallelise large scale batch computations on large amounts of data. however, there is a high latency in returning the results. It is not suitable for low latency.What are the features of a Big Data system?RobustFault TolerantHuman Fault TolerantData when you need itScaleableGeneralExtensibleReduced implementation complexityError handlingAuditing-- no different from a little Data Solution. Think inserts.
  • Relational databases are pushed to the limit.Data Management techniques haven't scaledTraditional systems haven't scaledBig data is about complexity as well as scalability.NoSQL as a paradigm shift.Hadoop can run and parallelise large scale batch computations on large amounts of data. however, there is a high latency in returning the results. It is not suitable for low latency.What are the features of a Big Data system?RobustFault TolerantHuman Fault TolerantData when you need itScaleableGeneralExtensibleReduced implementation complexityError handlingAuditing-- no different from a little Data Solution. Think inserts.
  • There are some things in life are so complicated and abstract that they’re awesome. Eternity, cosmic significance, and the infinite universe are just a few of these awesome, convoluted concepts that have kept us fascinated and confused since the beginning of human consciousness.Awe - perceptual expansion, such perceptual vastness that you literally have to configure your mental schemata just to accommodate, just to take in the scale, of the experienceanthological awakening, realization of the connectedness of all things, and also the continuum from inanimate to animate matter; all of it is nature, all of it is inevitable, all of it is emerging as part of the same evolutionary processPhysicist Freeman Dyson speaks of a new future where a new generation of artists will write genomes the way that Shakespeare used to write verses
  • Courtesy of WIPRO
  • Teradata and Lyn Langit slide.we’ve got 7 billion people, we got 6 billion devices90% of the world’s data was created in the last two years aloneNot the data that’s kept behind corporate walls. unstructured content, most of which didn’t even exist years ago: documents, tweets, images, videos posted to YouTube, data gathered from surveillance cameras. We post, we blog, we share, we tweet, we like or don’t like. We have a voice and we leave a digital trail. And every tweet we send is being followed, monitored, analyzed, acted on. Companies are analyzing social to find out what you’re thinking, to know what new products and services you want even before you do. A new initiative by the U.N. is actually using sentiment analyses to help predict the civil unrest, job losses, spending reductions, disease outbreaks
  • Digital Marketingoptimisation – golden path analysis, clickthroughtsDigital Exploration – Discovery, new marketsMachine generated analytics – logs, real time, telemetry. Location. Remote sensors.Data Retention – archivingTraditionally: Physics Experiments, Sensor data, Satellite data, …Now:Operational LogsCustomer behaviorSocial interactions online…From Terabytes in the 1990 over Petabytes today to Zetabytes in the future
  • What do we have now? It is like a vacuum tube; slow and expensive.Why did Big Data get big?
  • What do we have now? It is like a vacuum tube; slow and expensive.Why did Big Data get big?
  • Volume – data comes in one size – large.Variety – structured and unstructure data.Veracity – good and bad data.Velocity – fast moving.Value – business value
  • Unlike real crude oil, data can be re-used. It can be mined for profit.It needs to be re-shaped in order to be used.If you don’t’ have your data, you don’t have anything! You lose your business.
  • Thanks to @SiSense and Bruno Aziza
  • If you don’t’ have your data, you don’t have anything! You lose your business.
  • Actionable InsightPredictive InsightBusiness ImpactCustomer Discernment
  • Relational databases are pushed to the limit.Data Management techniques haven't scaledTraditional systems haven't scaledBig data is about complexity as well as scalability.NoSQL as a paradigm shift.Hadoop can run and parallelise large scale batch computations on large amounts of data. however, there is a high latency in returning the results. It is not suitable for low latency.What are the features of a Big Data system?RobustFault TolerantHuman Fault TolerantData when you need itScaleableGeneralExtensibleReduced implementation complexityError handlingAuditing-- no different from a little Data Solution. Think inserts.
  • Big DataThis is a picture down the center isle of a shipping container from one of Microsoft’s datacenters. We put ~1800 computers inside one of these containers. Some of us had the privilege of working on the data storage and computational platform that powers Bing. We used 22 of these containers, spanning 40,000 machines where we stored over 100PB of data. This was three years ago, and now these servers are almost obsolete.Big Data is in constant motion and growing at an incredible rate,90% of the world’s data generated in just the past two years. That's remarkable growth. Technology history has taught us that the one with themost data wins. The empires of data like Twitter, Facebook, Yahoo all of whom are able to capitalize on the notion that data equates to power. More and more companies are increasingly utilizing Hadoop to power Big Data analytics and drive revenue and profit.It’s all about your Data.
  • Some examples of organizations that delivering new value based in the form of revenue growth, cost savings or creating entirely new business models.Yahoo - AS with Hive, Klout - AS with Hive (white paper), GE - Hive AnalyticsYahoo! (Gartner BI Excellence Award Winner) is driving growth for existing revenue streams:Yahoo! manages a powerful, scalable advertising exchange that includes publishers and advertisers.Advertisers want to get the most out of their investment by reaching their targeted audiences effectively and efficiently.Yahoo! needs visibility into how consumers are responding to ads alongmany dimensions (websites, creative, time of day, gender, age, location) to make the exchangework as efficiently and effectivelyas possible.Yahoo! doubled its revenue by allowing campaign managers to “tune” campaign targeting and creative.Yahoo! drove an increase in spending from advertisers since they got better performance by advertising through Yahoo!.Yahoo! TAO exposed customer segment performance to campaign managers and advertisers for the first time.Klout is creating new businesses and revenue streams:Klout’s mission is to help everyone understand and leverage their influence. Klout uses Big Data to unify the social web (consumers, brands, and partners) with social networking and activity, along with data to generate a Klout score and enable analysis, targeting, and social graphs.Helps consumers manage their “social brand.”Helps brands reach influencers at scale.Helps data partners enhance their services (customer loyalty, CRM, media and identity, and marketing). For example, the Palms uses Klout scores in addition to their normal customer rewards program to determine whether or not to upgrade their customers to a better room during their stay. The Huffington Post uses Klout to help serve the best curated Twitter content.Klout Case Study: http://www.microsoft.com/casestudies/Microsoft-SQL-Server-2012-Enterprise/Klout/Data-Services-Firm-Uses-Microsoft-BI-and-Hadoop-to-Boost-Insight-into-Big-Data/710000000129Case Study on Thailand’s Department of Special Investigations : http://www.microsoft.com/casestudies/Microsoft-SQL-Server-2012-Enterprise/Department-of-Special-Investigation/Thai-Law-Enforcement-Agency-Optimizes-Investigations-with-Big-Data-Solution/710000001175 GE is driving operational efficiencies:GE is running several use cases on its Hadoop cluster while incorporating several different disparate sources to produce results. Along with sentiment analysis, GE is running web analytics on its internal cloud structure and looking at load usage, user analytics, and failure mode analytics. GE built a recommendation engine for its intranet involving various press releases users might be interested in based on their function, user profiles, and prior visits to its site. GE is working with several types of remote monitoring and diagnostic data from energy and wind businesses.
  • Business Users need data. There is a paradigm shift towards it, despite what the cartoon says.
  • Processing Platform for Big Data ProcessingUsing the “Map-Reduce” Processing ParadigmWhen people talk about Hadoop they are often talking about specific computational patterns including map reduce, which emerged as a method to process lots of unstructured data on top of a distributed storage system in a highly fault tolerant and embarrassingly scalable way.   Hadoop allows us to store and process large amounts of data on commodity hardware. In the past you would spend large amounts of money on very specialized hardware. Today you can do this with off the shelf hardware running Hadoop. Now, Hadoop doesn’t have a monopoly on “big”, “real time” or “unstructured” but does provide some unique capabilities.  
  • Acid – Atomicity, Consistency, Isolation, Durability
  • Assuming that the volumes of data are larger than those conventional relational database infrastructures can cope with, processing options break down broadly into a choice between massively parallel processing architectures — data warehouses or databases such as Greenplum — and Apache Hadoop-based solutions. This choice is often informed by the degree to which the one of the other "Vs" — variety — comes into play. Typically, data warehousing approaches involve predetermined schemas, suiting a regular and slowly evolving dataset. Apache Hadoop, on the other hand, places no conditions on the structure of the data it can process.
  • Hadoop, on the other hand, places no conditions on the structure of the data it can process.
  • I see the real breakthrough insights coming through when you take what is the traditional "Business Intelligence" and add more capabilities like machine learning, predictive analysis, statistical analysis, large scale graph processing, pattern mining, trend analysis, economic modeling. All of which today are a reality in Hadoop. The implications of this are quite astounding when you think about it. This is huge.
  • Acid – Atomicity, Consistency, Isolation, Durability
  • Big Data; in terms of data volume, variability and velocity at scale are is the first problem. But the Big Data solutions and technology by themselves don't lead to solving business objectives. We don't have a Hadoop problem they have analytics, pattern mining, trend analysis, statistical inferenceing, economic modeling, market regression level problems.Data science starts where the utility class services like Big Data Hadoop end. The real opportunity is to expose data science to everyone.As powerful as Hadoop is, today it’s still more of a computer scientist’s or academically-trained analyst’s tool than it is an enterprise analytics product. Hadoop itself is controlled through programming code rather than anything that looks like it was designed for business unit personnel. Hadoop data is often more “raw” and “wild” than data typically fed to data warehouse and OLAP (Online Analytical Processing) systems. This is where I and Microsoft see opportunity.  Essentially; wouldn't it be cool if mere mortals could use this stuff and consume insights that are directly coming from Hadoop? Microsoft HDInsight enables you to gain insight from virtually any data, connect with the world of data, improve decision making, and enhance the development of the next generation of products and services.Nearly everyone in your organization can analyze and make more informed decisions with the right tools.PowerPivot for Microsoft Excel and Power View for SharePoint give nearly all users a view into structured and unstructured data.With the Hive Add-in for Excel and Hive ODBC Driver, almost anyone in your organization can directly access Hadoop datafrom end-user tools.Hadoop simplifies programming for developers with JavaScript for MapReduce jobs. The JavaScriptimplementation can also reduce your code by up to 10 times compared to Java. 
  • The second thing I want to talk about is Hadoop and how Hadoop is setup to deliver Breakthrough Insights from your data.How many of you are familiar with Hadoop? How many of you are using Hadoop for projects today?How many are planning on using Hadoop in the next 12mo? How about in the cloud?When people talk about Hadoop they are often talking about specific computational patterns including map reduce, which emerged as a method to process lots of unstructured data on top of a distributed storage system in a highly fault tolerant and embarrassingly scalable way.   Hadoop allows us to store and process large amounts of data on commodity hardware. In the past you would spend large amounts of money on very specialized hardware. Today you can do this with off the shelf hardware running Hadoop. Now, Hadoop doesn’t have a monopoly on “big”, “real time” or “unstructured” but does provide some unique capabilities.  
  • The second thing I want to talk about is Hadoop and how Hadoop is setup to deliver Breakthrough Insights from your data.How many of you are familiar with Hadoop? How many of you are using Hadoop for projects today?How many are planning on using Hadoop in the next 12mo? How about in the cloud?When people talk about Hadoop they are often talking about specific computational patterns including map reduce, which emerged as a method to process lots of unstructured data on top of a distributed storage system in a highly fault tolerant and embarrassingly scalable way.   Hadoop allows us to store and process large amounts of data on commodity hardware. In the past you would spend large amounts of money on very specialized hardware. Today you can do this with off the shelf hardware running Hadoop. Now, Hadoop doesn’t have a monopoly on “big”, “real time” or “unstructured” but does provide some unique capabilities.  
  • There are other talks that will go into Big Data and Hadoop so we’ll only do a quick overview of that right now. We’ll spend most of our time on Hive.
  • Data democracy
  • Ask the audience first.

Big Data Visualisation with Hadoop and PowerPivot Big Data Visualisation with Hadoop and PowerPivot Presentation Transcript

  • Website:http://www.jenstirrup.comTwitter: @jenstirrupEmail: Jen.Stirrup@copper-blue.com
  • Data ScientistsYou’re Incredible!And you are…. People who like data to be correct.
  • Agenda
  • How did Big Data get Big?
  • As long as you’re gonna be thinking anyway,why not think big. (Donald Trump)Because we can imagine, we are free (Jean-Paul Satre)What kind of modern world would we have ifEdison, Green and Dixon had not developedcinematic technology before Hitchcock grewup? (Kevin Kelly, futurist)
  • The Unknown Unknowns• That is to say, there are things that we knowwe dont know. But there are also unknownunknowns. There are things we dont knowwe dont know. (Donald Rumsfeld)
  • Data ScientistsYou’re Incredible!
  • Examples of Big Data
  • Big Data Takeaways. V is for:
  • Data is Black Gold
  • What does it mean for Enterprises?
  • Agenda
  • Big Data.
  • Data Management StrategyOLTPSinglePurposeDWMultiPurposeDWMapReduceCompute Trend
  • Increases ad revenue by processing 3.5billion events per dayMassive VolumesProcesses 464 billion rows per quarter,with average query time under 10 secs.Measures and ranks online userinfluence by processing 3 billion signalsper dayCloud ConnectivityConnects across 15 social networks viathe cloud for data and API accessUses sentiment analysis and webanalytics for its internal cloudReal-Time InsightImproves operational decision makingfor IT managers and users
  • Hadoop is for BigData.
  • What is Hadoop?“Flexible and AvailableArchitecture for Large Scalecomputation and data processingon a network of highly availablecommodity hardware.”
  • Hadoop’s Lineage* Resource: Kerberos Konference (Yahoo) – 2010
  • Distributed Storage(HDFS)HDInsight EcosystemDistributed Processing(Map Reduce)ODBC(Azure DataMarketplace)Windows AzureStorage
  • Hadoop Key Terms
  • Hadoop CapabilitiesMachineLearningGraphProcessingDistributedComputeExtract LoadTransformPredictiveAnalysis
  • Why Hadoop?Open Source SoftwareCommodity Hardware= Reduction of Costs for IT
  • Hadoop vs RDBMsApache Hadoop isn’t a substitute for adatabase• It is not Relational• Key Value pairs• Big Data
  • Hadoop vs RDBMs• Unstructured / Semi structured• Structured• Works together with RDBMs
  • Data Knowledge ActionHDInsight
  • How canMicrosoft help?
  • ..Bringing homeall thistechnology, allyour data infamiliar packages
  • Big Agenda
  • BIG DATA REQUIRES AN END-TO-END APPROACHDiscover Combine RefineRelational Non-relational StreamingINSIGHTDATAENRICHMENTDATAMANAGEMENTSelf-Service Collaboration Corporate Apps DevicesAnalytical
  • Data Knowledge ActionHDInsight
  • Microsoft Hadoop VisionRuns on Windows and Azure• Active Directory• System Center• .Net ProgrammabilityMicrosoft Data Connectivity• SQL Server / SQL Parallel Data Warehouse• Azure Storage / Azure Data Market
  • Microsoft Hadoop VisionMicrosoft Business Intelligence• Hive ODBC Connectivity• BI Tools for Big DataCollaboratewith and Contribute to OSS• Collaborate with HortonWorks• Provide improvements and Windows support back to OSS
  • On Premise• Comes with:•Hadoop command line (shell)•Hadoop Status for name node andmap-reduce cluster•HDInsight Dashboard
  • On Premise• On prem:http://www.microsoft.com/bigdata/• Single node cluster (onebox) install• C:hadoop• Starts local services
  • On Azure• On Windows Azure:http://HadoopOnAzure.com/• 3 node cluster running as a service in Azure• Can be used for 5 days• Provides samples and HDInsight Dashboard• TAP Program
  • Agenda•Big Data – What is it?• Big Data or Big Hype?• Big Data, Big Insights withHadoop
  • Because we can imagine,we are freeJean-Paul SatreWe have the tools. All we’ve got todo is imagine what could be. We canreinvent the present; we cantransform the world around us.Jason Silva
  • Recap