The document discusses Microsoft's investment in and vision for big data and Apache Hadoop. Microsoft is delivering Apache Hadoop running on both Windows Server and Windows Azure, as part of its open source commitment. It is also integrating Hadoop with Microsoft technologies like Active Directory, SQL Server, and Azure Storage to provide better experiences on Windows and Azure.
Big Data is one of the hot topics and has got the attention of the IT industry globally. It is a popular term used to describe the exponential growth and availability of data, both structured and unstructured. And big data may be as important to business – and society – as the Internet has become. More accurate analyses may lead to more confident decision making. And better decisions can mean greater operational efficiencies, cost reductions and reduced risk.
This presentation focuses on why, what, how of big data as we explore some of Microsoft's big data solutions - HDInsight azure service and PowerBI, providing insights into the world of Big data.
How to boost your datamanagement with Dremio ?Vincent Terrasi
Works with any source. Relational, non-relational, 3rd party apps. 5 years ago nobody was using Hadoop, MongoDB, and 5 years from now there will be new products. You need a solution that is future proof.
Works with any BI tool. In every company multiple tools are in use. Each department has their favorite. We need to work with all of them.
No ETL, data warehouse, cubes. This would need to give you a really good alternative to these options.
Makes data self-service, collaborative. Probably most important of all, we need to change the dynamic between the business and IT. We need to make it so business users can get the data they want, in the shape they want it, without waiting on IT.
Makes Big Data feels small. It needs to make billions of rows feel like a spreadsheet on your desktop.
Open source. It’s 2017, so we think this has to be open source.
Data Visualisation with Hadoop Mashups, Hive, Power BI and Excel 2013Jen Stirrup
This session focused on data visualisation using Power BI, based on big data. Some examples of Hive and HDFS file storage are given. An overview of Microsoft HDInsight is supplied.
Apache Hadoop is a platform that has emerged to help extract insight from all that data. In this session, you will learn the basics of Hadoop, how to get up and running with Hadoop in the cloud using Microsoft Azure HDInsight, and how you can leverage the deeper integration of Visual Studio to integrate Big Data with your existing applications. No previous experience with Hadoop is required.
Presented @ MSDEVMTL on Saturday February , 2015
Big Data is one of the hot topics and has got the attention of the IT industry globally. It is a popular term used to describe the exponential growth and availability of data, both structured and unstructured. And big data may be as important to business – and society – as the Internet has become. More accurate analyses may lead to more confident decision making. And better decisions can mean greater operational efficiencies, cost reductions and reduced risk.
This presentation focuses on why, what, how of big data as we explore some of Microsoft's big data solutions - HDInsight azure service and PowerBI, providing insights into the world of Big data.
How to boost your datamanagement with Dremio ?Vincent Terrasi
Works with any source. Relational, non-relational, 3rd party apps. 5 years ago nobody was using Hadoop, MongoDB, and 5 years from now there will be new products. You need a solution that is future proof.
Works with any BI tool. In every company multiple tools are in use. Each department has their favorite. We need to work with all of them.
No ETL, data warehouse, cubes. This would need to give you a really good alternative to these options.
Makes data self-service, collaborative. Probably most important of all, we need to change the dynamic between the business and IT. We need to make it so business users can get the data they want, in the shape they want it, without waiting on IT.
Makes Big Data feels small. It needs to make billions of rows feel like a spreadsheet on your desktop.
Open source. It’s 2017, so we think this has to be open source.
Data Visualisation with Hadoop Mashups, Hive, Power BI and Excel 2013Jen Stirrup
This session focused on data visualisation using Power BI, based on big data. Some examples of Hive and HDFS file storage are given. An overview of Microsoft HDInsight is supplied.
Apache Hadoop is a platform that has emerged to help extract insight from all that data. In this session, you will learn the basics of Hadoop, how to get up and running with Hadoop in the cloud using Microsoft Azure HDInsight, and how you can leverage the deeper integration of Visual Studio to integrate Big Data with your existing applications. No previous experience with Hadoop is required.
Presented @ MSDEVMTL on Saturday February , 2015
On the move with Big Data (Hadoop, Pig, Sqoop, SSIS...)Stéphane Fréchette
How is Big Data moved around? How are you planning to move it?
This session will focus on familiar and not so similar tools you can use today
for moving and integrating Big Data. Also important to outline the technologies and platform (introduction to Big Data, Hadoop, HDInsight and tools).
We will compare and outline options,
discuss how they can work with your existing Hadoop and Windows Azure
environment, and provide some guidance on when and how to use each of these
tools.
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...Databricks
How did Devon move from a traditional reporting and data warehouse approach to a modern data lake? What did it take to go from a slow and brittle technical landscape to an a flexible, scalable, and agile platform? In the past, Devon addressed data solutions in dozens of ways depending on the user and the requirements. Through a visionary program, driven by Databricks, Devon has begun a transformation of how it consumes data and enables engineers, analysts, and IT developers to deliver data driven solutions along all levels of the data analytics spectrum. We will share the vision, technical architecture, influential decisions, and lessons learned from our journey. Join us to hear the unique Databricks success story at Devon.
Here I talk about examples and use cases for Big Data & Big Data Analytics and how we accomplished massive-scale sentiment, campaign and marketing analytics for Razorfish using a collecting of database, Big Data and analytics technologies.
Bloor Research & DataStax: How graph databases solve previously unsolvable bu...DataStax
Philip Howard, industry analyst and database technology expert from Bloor Research International will present recent market research results and discuss the best and latest solutions as well as provide advice for identifying the right match for specific use cases. DataStax will also share case studies where DSE Graph technology is being applied to transform the customer experience in industry sectors such as Financial Services, Retail, Telecommunications, Logistics, Media and Entertainment. Attend this webinar to find out more about graph database technology, all the choices on the market today and how you can transform your own technical solutions and customer experience.
Webinar recording: https://youtu.be/s0Hozx_bdZ4
For current and on-demand DataStax webinars, visit: http://www.datastax.com/resources/webinars
To transform your organization and unlock the value of your data, you need a way to ingest, store and analyze every type of data in your organization.
This presentation covers the Data Access Layer of the Hadoop Ecosystem which enables you to achieve this.
We will use the HDP (Hortonworks Data Platform) reference architecture to walk through the Hadoop core and its ecosystem with focus on the data access layer.
We will cover some of the prominent tools of the ecosystem such as Pig, Hive, Sqoop, Flume and Oozie and how they are used for ingesting data into Hadoop from structured, unstructured and streaming sources.
Talk to us at +91 80 6567 9700 or send an email to training@springpeople.com for more information.
How to get started in Big Data without Big Costs - StampedeCon 2016StampedeCon
Looking to implement Hadoop but haven’t pulled the trigger yet? You are not alone. Many companies have heard the hype about how Hadoop can solve the challenges presented by big data, but few have actually implemented it. What’s preventing them from taking the plunge? Can it be done in small steps to ensure project success?
This session will discuss some of the items to consider when getting started with Hadoop and how to go about making the decision to move to the de facto big data platform. Starting small can be a good approach when your company is learning the basics and deciding what direction to take. There is no need to invest large amounts of time and money up front if a proof of concept is all you aim to provide. Using well known data sets on virtual machines can provide a low cost and effort implementation to know if your big data journey will be successful with Hadoop.
Hydra - Content Processing Framework for Search Driven SolutionsFindwise
Presented at Lucene Revolution, 7-8 May in Boston and Berlin Buzzwords 4-5 June, 2012.
When working with free text search, the quality of the data in the index is a key factor on the quality of the results delivered and has a major impact on the information consumption experience. Hydra is designed to give the search solution the tools necessary to modify the data that is to be indexed in an efficient and flexible way. Providing a scalable and efficient pipeline which the documents pass through before being indexed into the search engine does this.
Big data requires service that can orchestrate and operationalize processes to refine the enormous stores of raw data into actionable business insights. Azure Data Factory is a managed cloud service that's built for these complex hybrid extract-transform-load (ETL), extract-load-transform (ELT), and data integration projects.
Next Generation Data Platforms - Deon ThomasThoughtworks
A new generation of technologies and architectures designed to economically extract value from very large volumes of a wide variety of data, by enabling high velocity capture, discovery and/or analysis.
When Databases Meet Big data and Hadoop - Uni of Tromso Online LectureIrfan Elahi
Slides of my online lecture that I delivered to the grad students of University of Tromsø (Norway) about
"When Databases Meet Big Data - Expectations, Challenges and Opportunities"
on 13/09/2018.
The lecture provided an overview of what databases have been used for traditionally and with the rise of big data paradigms, what expectations do enterprises and organizations have now from them. With the shift from vertical scaling to horizontal scaling, what challenges germinate in the context of functional capabilities of databases and how does it all align with the expectations from big data platforms which are increasingly being considered for use-cases like ETL offloading and scalable data warehousing. Lastly, what opportunities lie in this niche and what lies beyond.
On the move with Big Data (Hadoop, Pig, Sqoop, SSIS...)Stéphane Fréchette
How is Big Data moved around? How are you planning to move it?
This session will focus on familiar and not so similar tools you can use today
for moving and integrating Big Data. Also important to outline the technologies and platform (introduction to Big Data, Hadoop, HDInsight and tools).
We will compare and outline options,
discuss how they can work with your existing Hadoop and Windows Azure
environment, and provide some guidance on when and how to use each of these
tools.
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...Databricks
How did Devon move from a traditional reporting and data warehouse approach to a modern data lake? What did it take to go from a slow and brittle technical landscape to an a flexible, scalable, and agile platform? In the past, Devon addressed data solutions in dozens of ways depending on the user and the requirements. Through a visionary program, driven by Databricks, Devon has begun a transformation of how it consumes data and enables engineers, analysts, and IT developers to deliver data driven solutions along all levels of the data analytics spectrum. We will share the vision, technical architecture, influential decisions, and lessons learned from our journey. Join us to hear the unique Databricks success story at Devon.
Here I talk about examples and use cases for Big Data & Big Data Analytics and how we accomplished massive-scale sentiment, campaign and marketing analytics for Razorfish using a collecting of database, Big Data and analytics technologies.
Bloor Research & DataStax: How graph databases solve previously unsolvable bu...DataStax
Philip Howard, industry analyst and database technology expert from Bloor Research International will present recent market research results and discuss the best and latest solutions as well as provide advice for identifying the right match for specific use cases. DataStax will also share case studies where DSE Graph technology is being applied to transform the customer experience in industry sectors such as Financial Services, Retail, Telecommunications, Logistics, Media and Entertainment. Attend this webinar to find out more about graph database technology, all the choices on the market today and how you can transform your own technical solutions and customer experience.
Webinar recording: https://youtu.be/s0Hozx_bdZ4
For current and on-demand DataStax webinars, visit: http://www.datastax.com/resources/webinars
To transform your organization and unlock the value of your data, you need a way to ingest, store and analyze every type of data in your organization.
This presentation covers the Data Access Layer of the Hadoop Ecosystem which enables you to achieve this.
We will use the HDP (Hortonworks Data Platform) reference architecture to walk through the Hadoop core and its ecosystem with focus on the data access layer.
We will cover some of the prominent tools of the ecosystem such as Pig, Hive, Sqoop, Flume and Oozie and how they are used for ingesting data into Hadoop from structured, unstructured and streaming sources.
Talk to us at +91 80 6567 9700 or send an email to training@springpeople.com for more information.
How to get started in Big Data without Big Costs - StampedeCon 2016StampedeCon
Looking to implement Hadoop but haven’t pulled the trigger yet? You are not alone. Many companies have heard the hype about how Hadoop can solve the challenges presented by big data, but few have actually implemented it. What’s preventing them from taking the plunge? Can it be done in small steps to ensure project success?
This session will discuss some of the items to consider when getting started with Hadoop and how to go about making the decision to move to the de facto big data platform. Starting small can be a good approach when your company is learning the basics and deciding what direction to take. There is no need to invest large amounts of time and money up front if a proof of concept is all you aim to provide. Using well known data sets on virtual machines can provide a low cost and effort implementation to know if your big data journey will be successful with Hadoop.
Hydra - Content Processing Framework for Search Driven SolutionsFindwise
Presented at Lucene Revolution, 7-8 May in Boston and Berlin Buzzwords 4-5 June, 2012.
When working with free text search, the quality of the data in the index is a key factor on the quality of the results delivered and has a major impact on the information consumption experience. Hydra is designed to give the search solution the tools necessary to modify the data that is to be indexed in an efficient and flexible way. Providing a scalable and efficient pipeline which the documents pass through before being indexed into the search engine does this.
Big data requires service that can orchestrate and operationalize processes to refine the enormous stores of raw data into actionable business insights. Azure Data Factory is a managed cloud service that's built for these complex hybrid extract-transform-load (ETL), extract-load-transform (ELT), and data integration projects.
Next Generation Data Platforms - Deon ThomasThoughtworks
A new generation of technologies and architectures designed to economically extract value from very large volumes of a wide variety of data, by enabling high velocity capture, discovery and/or analysis.
When Databases Meet Big data and Hadoop - Uni of Tromso Online LectureIrfan Elahi
Slides of my online lecture that I delivered to the grad students of University of Tromsø (Norway) about
"When Databases Meet Big Data - Expectations, Challenges and Opportunities"
on 13/09/2018.
The lecture provided an overview of what databases have been used for traditionally and with the rise of big data paradigms, what expectations do enterprises and organizations have now from them. With the shift from vertical scaling to horizontal scaling, what challenges germinate in the context of functional capabilities of databases and how does it all align with the expectations from big data platforms which are increasingly being considered for use-cases like ETL offloading and scalable data warehousing. Lastly, what opportunities lie in this niche and what lies beyond.
Deciding the deployment model is critical when enterprises adopt Hadoop. Initially, the bare metal (on-premise cluster with physical servers) model was popular to avoid I/O overhead in the virtualized environments. However, these days, cloud is also a contending option with its compelling cost savings, and ease of operation. To aid in assessing the deployment options, Accenture Technology Labs developed Accenture Data Platform Benchmark suite, a total cost of ownership (TCO) model and has tuned and compared performance of bare metal Hadoop clusters and Hadoop cloud service. Interestingly enough, the study discovered that price/performance ratio is not a critical factor in making a Hadoop deployment decision. Employing empirical and systemic analyses, the study resulted in comparable price/performance ratio from both bare metal Hadoop clusters and Hadoop-as-a-service. Moreover, cheaper purchasing options (e.g., long term contracts) provides better ratio than the bare metal one in many cases. Thus, this result debunks the idea that the cloud is not suitable to Hadoop MapReduce workloads due to their heavy I/O requirements. Furthermore, the study finds that the Hadoop default configuration provides ample headroom for performance tuning, and the cloud infrastructure enables even further performance tuning opportunities.
The TCO Calculator - Estimate the True Cost of Hadoop MapR Technologies
http://bit.ly/1wsAuRS - There are many hidden costs for Apache Hadoop that have different effects across different Hadoop distributions. With the new MapR TCO calculator organisations have a simple and reliable tool that is based on facts to compare costs.
Lacking the technology to directly leverage Hadoop, some companies are foregoing its full benefits opting to treat Hadoop as just another data source for their legacy BI tools. But storage is only one benefit of Hadoop and ignores its linear scalability and data flexibility across all data types. Using Hadoop natively for both storage and computation in an analytic capacity has already led to dramatic increases in business benefits. Hadoop analytics has already identified over $2B in potential fraud at one of the world’s largest credit card companies. Sears has already reduced reporting times over traditional BI from 12 weeks to 3 days. A major internet security company increased customer conversion by 60% and revenue by $20 million. Meaningful returns are spread across Fortune 100 enterprises and fast growing startups with the common thread being self-service big data analytics leveraging Hadoop’s native capabilities. In this talk, we’ll highlight the core value proposition of building analytics natively on Hadoop, share real-world use cases that resulted in dramatic ROI, and reveal the next major step in visual big data analytics.
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)Sascha Dittmann
In dieser Session stellen wir anhand eines praktischen Szenarios vor, wie konkrete Aufgabenstellungen mit HDInsight in der Praxis gelöst werden können:
- Grundlagen von HDInsight für Windows Server und Windows Azure
- Mit Windows Azure HDInsight arbeiten
- MapReduce-Jobs mit Javascript und .NET Code implementieren
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...Imam Raza
Google Next Extended (https://cloudnext.withgoogle.com/) is an annual Google event focusing on Google cloud technologies. This presentation is from tech talk held in Google Next Extended 2017 Karachi event
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionJames Serra
It can be quite challenging keeping up with the frequent updates to the Microsoft products and understanding all their use cases and how all the products fit together. In this session we will differentiate the use cases for each of the Microsoft services, explaining and demonstrating what is good and what isn't, in order for you to position, design and deliver the proper adoption use cases for each with your customers. We will cover a wide range of products such as Databricks, SQL Data Warehouse, HDInsight, Azure Data Lake Analytics, Azure Data Lake Store, Blob storage, and AAS as well as high-level concepts such as when to use a data lake. We will also review the most common reference architectures (“patterns”) witnessed in customer adoption.
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...BigDataEverywhere
Mohammad Quraishi, Senior IT Principal, Cigna
Like Moses seeing the Promised Land from afar, we knew the big data journey would be worth it, but we didn't know how hard it would be. In this talk, I'll delve into the details of our big data and analytics initiative at Cigna,
Enough taking about Big data and Hadoop and let’s see how Hadoop works in action.
We will locate a real dataset, ingest it to our cluster, connect it to a database, apply some queries and data transformations on it , save our result and show it via BI tool.
10. Hadoop and Microsoft.
Big engineering investment
• Big Data Business Intelligence tooling
• Big Data Apache Hadoop
• Big Data Parallel Data Warehouse
Open source Commitment
• Apache Software Foundation
• Hortonworks Partnership
We are delivering
• Apache Hadoop on Windows Server
• Apache Hadoop on Windows Azure
11. Microsoft Hadoop Vision.
Better on Windows and Azure
• Active Directory
• System Center
Microsoft Data Connectivity
• SQL Server / SQL Parallel Data Warehouse
• Azure Storage / Azure Data Market
Microsoft Business Intelligence (BI)
• ODBC Connectivity
12. ACM Hackathon.
Free Hadoop on Azure
• Code: acmhackathon
Free 30 day Azure account
• No credit card
• 750h small compute / 35GB storage
• Email brad@bing.com for code
Hadoop on Azure demo
Editor's Notes
Good afternoon. Thanks for coming, I know you're going to be really excited about this. I'm going to talk about Big Data, Hadoop and Microsoft It's just simply amazing to see the growing momentum around Big Data conversations happening today. Hadoop is changing the conversations that we have about Data, Big Data. I want to make sure we stay grounded in thinking through how to make money and save money with your Data using Hadoop.<next slide>
Let’s talk about size for a moment. The example I like to use is the US library of congress. The US library of congress has millions of books, recording, photographs, maps, music and manuscripts. All put together they have around 300TB of information. How much is that? That's 838 miles of bookshelves; If you were to stretch those out end to end, then go downstairs, get in your car and start driving at 65mph you'd hit the end of the books 13 hours later in New York City.A little over three times that is a petabyte. Microsoft is managing well over 100 Petabytes of data across our online properties. That single row of bookshelfs from New York to Jacksonville Florida is now half a mile high. That’s stunning.We are adding 7.5PBs per month of new data, running 20k analytic jobs per day to run our online services business.The good news is that hardware is fast and cheap enough that now we can record this data and consume it. This simply wasn’t possible a few years ago. Hard drive density and CPU power continue to double every 18 months.From the Microsoft point of view we have a pretty good understanding of how to build and operate one of these infrastructures and in the end connect it thru to developers and end users. We’re the only ones in the 100+Petabyte Club who also run an enterprise software and cloud business. I see the complete solution where we enable developers to build applications on this data; and connect them through to our end users with BI tools to deliver Breakthrough Big Data Insights. I will talk more about that in the Hadoop and Excel talk and take this down to a practical level in my second talk. This leads me onto the next concept.. It’s all about the data
It’s all about your Data, Actually, it’s all about your big dataIs it big as in Volume? Where your data exceeds limits of physical capabilities of systems today.Is it Velocity? The data is moving at a fast rate and value can decay over time.Is it Variability? of structure from unstructured, semi-structured to highly structured data.Doug Laney http://blogs.gartner.com/doug-laney/files/2012/01/ad949-3D-Data-Management-Controlling-Data-Volume-Velocity-and-Variety.pdfThe answer is it’s all of the above.Now that you have Big Data; you have two problems. You have BIG DATA problemsAndYou have big DATA PROBLEMS
The second thing I want to talk about is Hadoop and how Hadoop is setup to deliver Breakthrough Business Insights from your data.How many of you are familiar with Hadoop? How many of you are using Hadoop for projects today?How many are planning on using Hadoop in the next 12mo? How about in the cloud?When people talk about Hadoop they are often talking about specific computational patterns including map reduce, which emerged as a method to process lots of unstructured data on top of a distributed storage system in a highly fault tolerant and embarrassingly scalable way. Hadoop allows us to store and process large amounts of data on commodity hardware. In the past you would spend large amounts of money on very specialized hardware. Today you can do this with off the shelf hardware running Hadoop. Now, Hadoop doesn’t have a monopoly on “big”, “real time” or “unstructured” but does provide some unique capabilities.
It's everywhere to be mined, but we have what one can call "the pomegranate problem" Imagine all of your data being inside a pomegranate. When you eat a pomegranate it’s a bit difficult getting into all of the little pieces inside the pomegranate out, it's a bit of work.That’s the process that you need to go through to extract business insights out of your data.It’s useful to think of it in this way; where your data is the platform. Not the tooling that surrounds it.It’s all about the data.I’d like to share with you my favorite big data quotation from a famous Big Data philosopher.<next slide>
We don't have a Hadoop problem they have analytics, pattern mining, trend analysis, statistical inferenceing, economic modeling, market regression level problems. Big Data; in terms of data size, variability and velocity at scale are is the first problem. But the Big Data solutions and technology by themselves don't lead to solving business objectives. Data science starts where the utility class services like Big Data Hadoop end. The real opportunity is for Data science as a hosted petascale service ontop of cloud infrastructure. As powerful as Hadoop is, today it’s still more of a computer scientist’s or academically-trained analyst’s tool than it is an enterprise analytics product. Hadoop itself is controlled through programming code rather than anything that looks like it was designed for business unit personnel. Hadoop data is often more “raw” and “wild” than data typically fed to data warehouse and OLAP (Online Analytical Processing) systems. This is where I and Microsoft see opportunity. Essentially; wouldn't it be cool if mere mortals could use this stuff and consume insights that are directly coming from Hadoop?
I see the real breakthrough insights coming through when you take what is the traditional "Business Intelligence" and add more capabilities like machine learning, predictive analysis, statistical analysis, large scale graph processing, pattern mining, trend analysis, economic modeling. All of which today are a reality in Hadoop. The implications of this are quite astounding when you think about it. This is huge.