Hadoop acm presentation

•Download as PPTX, PDF•

3 likes•856 views

The document discusses Microsoft's investment in and vision for big data and Apache Hadoop. Microsoft is delivering Apache Hadoop running on both Windows Server and Windows Azure, as part of its open source commitment. It is also integrating Hadoop with Microsoft technologies like Active Directory, SQL Server, and Azure Storage to provide better experiences on Windows and Azure.

Hadoop and Microsoft.

Brad Sarsfield | Senior Software Engineer @bradoop

Hadoop Capabilities.

Extract Load Distributed
Transform Compute

Predictive Machine Graph
Analysis Learning Processing

Hadoop architecture.

Distributed Processing
(Map Reduce)

Distributed Storage
(HDFS)

Hadoop and Microsoft.
Big engineering investment
• Big Data Business Intelligence tooling
• Big Data Apache Hadoop
• Big Data Parallel Data Warehouse

Open source Commitment
• Apache Software Foundation
• Hortonworks Partnership

We are delivering
• Apache Hadoop on Windows Server
• Apache Hadoop on Windows Azure

Microsoft Hadoop Vision.
Better on Windows and Azure
• Active Directory
• System Center

Microsoft Data Connectivity
• SQL Server / SQL Parallel Data Warehouse
• Azure Storage / Azure Data Market

Microsoft Business Intelligence (BI)
• ODBC Connectivity

ACM Hackathon.
Free Hadoop on Azure
• Code: acmhackathon

Free 30 day Azure account
• No credit card
• 750h small compute / 35GB storage
• Email brad@bing.com for code

Hadoop on Azure demo

Big Data is one of the hot topics and has got the attention of the IT industry globally. It is a popular term used to describe the exponential growth and availability of data, both structured and unstructured. And big data may be as important to business – and society – as the Internet has become. More accurate analyses may lead to more confident decision making. And better decisions can mean greater operational efficiencies, cost reductions and reduced risk. This presentation focuses on why, what, how of big data as we explore some of Microsoft's big data solutions - HDInsight azure service and PowerBI, providing insights into the world of Big data.

Big data in Azure

Venkatesh Narayanan

How to boost your datamanagement with Dremio ?

Vincent Terrasi

Works with any source. Relational, non-relational, 3rd party apps. 5 years ago nobody was using Hadoop, MongoDB, and 5 years from now there will be new products. You need a solution that is future proof. Works with any BI tool. In every company multiple tools are in use. Each department has their favorite. We need to work with all of them. No ETL, data warehouse, cubes. This would need to give you a really good alternative to these options. Makes data self-service, collaborative. Probably most important of all, we need to change the dynamic between the business and IT. We need to make it so business users can get the data they want, in the shape they want it, without waiting on IT. Makes Big Data feels small. It needs to make billions of rows feel like a spreadsheet on your desktop. Open source. It’s 2017, so we think this has to be open source.

Data Visualisation with Hadoop Mashups, Hive, Power BI and Excel 2013

Jen Stirrup

Spark | IBM

Rob Thomas

Introduction to Azure HDInsight

Stéphane Fréchette

Apache Hadoop is a platform that has emerged to help extract insight from all that data. In this session, you will learn the basics of Hadoop, how to get up and running with Hadoop in the cloud using Microsoft Azure HDInsight, and how you can leverage the deeper integration of Visual Studio to integrate Big Data with your existing applications. No previous experience with Hadoop is required. Presented @ MSDEVMTL on Saturday February , 2015

IBM Big Data Platform, 2012

Rob Thomas

Working with data using AI based tools

dhruv_gairola

How is Big Data moved around? How are you planning to move it? This session will focus on familiar and not so similar tools you can use today for moving and integrating Big Data. Also important to outline the technologies and platform (introduction to Big Data, Hadoop, HDInsight and tools). We will compare and outline options, discuss how they can work with your existing Hadoop and Windows Azure environment, and provide some guidance on when and how to use each of these tools.

Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...

Databricks

How did Devon move from a traditional reporting and data warehouse approach to a modern data lake? What did it take to go from a slow and brittle technical landscape to an a flexible, scalable, and agile platform? In the past, Devon addressed data solutions in dozens of ways depending on the user and the requirements. Through a visionary program, driven by Databricks, Devon has begun a transformation of how it consumes data and enables engineers, analysts, and IT developers to deliver data driven solutions along all levels of the data analytics spectrum. We will share the vision, technical architecture, influential decisions, and lessons learned from our journey. Join us to hear the unique Databricks success story at Devon.

Big Data in the Real World

Mark Kromer

Cognitives services

Michel HUBERT

Bloor Research & DataStax: How graph databases solve previously unsolvable bu...

DataStax

Philip Howard, industry analyst and database technology expert from Bloor Research International will present recent market research results and discuss the best and latest solutions as well as provide advice for identifying the right match for specific use cases. DataStax will also share case studies where DSE Graph technology is being applied to transform the customer experience in industry sectors such as Financial Services, Retail, Telecommunications, Logistics, Media and Entertainment. Attend this webinar to find out more about graph database technology, all the choices on the market today and how you can transform your own technical solutions and customer experience. Webinar recording: https://youtu.be/s0Hozx_bdZ4 For current and on-demand DataStax webinars, visit: http://www.datastax.com/resources/webinars

The Ecosystem is too damn big

DataWorks Summit/Hadoop Summit

Big Data Visualisation with Hadoop and PowerPivot

Jen Stirrup

Big Data Analytics with Hadoop, MongoDB and SQL Server

Mark Kromer

Big data and hadoop

Prashanth Yennampelli

Hadoop data access layer v4.0

SpringPeople

To transform your organization and unlock the value of your data, you need a way to ingest, store and analyze every type of data in your organization. This presentation covers the Data Access Layer of the Hadoop Ecosystem which enables you to achieve this. We will use the HDP (Hortonworks Data Platform) reference architecture to walk through the Hadoop core and its ecosystem with focus on the data access layer. We will cover some of the prominent tools of the ecosystem such as Pig, Hive, Sqoop, Flume and Oozie and how they are used for ingesting data into Hadoop from structured, unstructured and streaming sources. Talk to us at +91 80 6567 9700 or send an email to training@springpeople.com for more information.

How to get started in Big Data without Big Costs - StampedeCon 2016

StampedeCon

Looking to implement Hadoop but haven’t pulled the trigger yet? You are not alone. Many companies have heard the hype about how Hadoop can solve the challenges presented by big data, but few have actually implemented it. What’s preventing them from taking the plunge? Can it be done in small steps to ensure project success? This session will discuss some of the items to consider when getting started with Hadoop and how to go about making the decision to move to the de facto big data platform. Starting small can be a good approach when your company is learning the basics and deciding what direction to take. There is no need to invest large amounts of time and money up front if a proof of concept is all you aim to provide. Using well known data sets on virtual machines can provide a low cost and effort implementation to know if your big data journey will be successful with Hadoop.

Hydra - Content Processing Framework for Search Driven Solutions

Findwise

Presented at Lucene Revolution, 7-8 May in Boston and Berlin Buzzwords 4-5 June, 2012. When working with free text search, the quality of the data in the index is a key factor on the quality of the results delivered and has a major impact on the information consumption experience. Hydra is designed to give the search solution the tools necessary to modify the data that is to be indexed in an efficient and flexible way. Providing a scalable and efficient pipeline which the documents pass through before being indexed into the search engine does this.

Unleash the Power of Azure Data Factory - SQL User Group

Sergio Zenatti Filho

Qubole presentation for the Cleveland Big Data and Hadoop Meetup

Qubole

Next Generation Data Platforms - Deon Thomas

Thoughtworks

Tropos.io - Hadoop in the Cloud - BA4ALL 2016

Tropos.io

When Databases Meet Big data and Hadoop - Uni of Tromso Online Lecture

Irfan Elahi

Slides of my online lecture that I delivered to the grad students of University of Tromsø (Norway) about "When Databases Meet Big Data - Expectations, Challenges and Opportunities" on 13/09/2018. The lecture provided an overview of what databases have been used for traditionally and with the rise of big data paradigms, what expectations do enterprises and organizations have now from them. With the shift from vertical scaling to horizontal scaling, what challenges germinate in the context of functional capabilities of databases and how does it all align with the expectations from big data platforms which are increasingly being considered for use-cases like ETL offloading and scalable data warehousing. Lastly, what opportunities lie in this niche and what lies beyond.

Azure Con Cortana Analytics Suite

Andy Wright

Contact Centers Powered by EsgynRajender K Salgam

Auckland SQL Saturday - Azure Data Lake

Sergio Zenatti Filho

Hadoop in the Microsoft EnterpriseDataWorks Summit

Apache hadoop for windows server and windwos azureBrad Sarsfield

What's hot

On the move with Big Data (Hadoop, Pig, Sqoop, SSIS...)

Stéphane Fréchette

Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...

Databricks

Big Data in the Real World

Mark Kromer

Cognitives services

Michel HUBERT

Bloor Research & DataStax: How graph databases solve previously unsolvable bu...

DataStax

The Ecosystem is too damn big

DataWorks Summit/Hadoop Summit

Big Data Visualisation with Hadoop and PowerPivot

Jen Stirrup

Big Data Analytics with Hadoop, MongoDB and SQL Server

Mark Kromer

Big data and hadoop

Prashanth Yennampelli

Hadoop data access layer v4.0

SpringPeople

How to get started in Big Data without Big Costs - StampedeCon 2016

StampedeCon

Hydra - Content Processing Framework for Search Driven Solutions

Findwise

Unleash the Power of Azure Data Factory - SQL User Group

Sergio Zenatti Filho

Qubole presentation for the Cleveland Big Data and Hadoop Meetup

Qubole

Next Generation Data Platforms - Deon Thomas

Thoughtworks

Tropos.io - Hadoop in the Cloud - BA4ALL 2016

Tropos.io

When Databases Meet Big data and Hadoop - Uni of Tromso Online Lecture

Irfan Elahi

Azure Con Cortana Analytics Suite

Andy Wright

Contact Centers Powered by EsgynRajender K Salgam

Auckland SQL Saturday - Azure Data Lake

Sergio Zenatti Filho

What's hot (20)

On the move with Big Data (Hadoop, Pig, Sqoop, SSIS...)

Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...

Big Data in the Real World

Cognitives services

Bloor Research & DataStax: How graph databases solve previously unsolvable bu...

The Ecosystem is too damn big

Big Data Visualisation with Hadoop and PowerPivot

Big Data Analytics with Hadoop, MongoDB and SQL Server

Big data and hadoop

Hadoop data access layer v4.0

How to get started in Big Data without Big Costs - StampedeCon 2016

Hydra - Content Processing Framework for Search Driven Solutions

Unleash the Power of Azure Data Factory - SQL User Group

Qubole presentation for the Cleveland Big Data and Hadoop Meetup

Next Generation Data Platforms - Deon Thomas

Tropos.io - Hadoop in the Cloud - BA4ALL 2016

When Databases Meet Big data and Hadoop - Uni of Tromso Online Lecture

Azure Con Cortana Analytics Suite

Contact Centers Powered by Esgyn

Auckland SQL Saturday - Azure Data Lake

Viewers also liked

Hadoop in the Microsoft EnterpriseDataWorks Summit

Apache hadoop for windows server and windwos azureBrad Sarsfield

Microsoft's Hadoop Story

Michael Rys

Where to Deploy Hadoop: Bare Metal or Cloud?

DataWorks Summit

Deciding the deployment model is critical when enterprises adopt Hadoop. Initially, the bare metal (on-premise cluster with physical servers) model was popular to avoid I/O overhead in the virtualized environments. However, these days, cloud is also a contending option with its compelling cost savings, and ease of operation. To aid in assessing the deployment options, Accenture Technology Labs developed Accenture Data Platform Benchmark suite, a total cost of ownership (TCO) model and has tuned and compared performance of bare metal Hadoop clusters and Hadoop cloud service. Interestingly enough, the study discovered that price/performance ratio is not a critical factor in making a Hadoop deployment decision. Employing empirical and systemic analyses, the study resulted in comparable price/performance ratio from both bare metal Hadoop clusters and Hadoop-as-a-service. Moreover, cheaper purchasing options (e.g., long term contracts) provides better ratio than the bare metal one in many cases. Thus, this result debunks the idea that the cloud is not suitable to Hadoop MapReduce workloads due to their heavy I/O requirements. Furthermore, the study finds that the Hadoop default configuration provides ample headroom for performance tuning, and the cloud infrastructure enables even further performance tuning opportunities.

The TCO Calculator - Estimate the True Cost of Hadoop

MapR Technologies

ROI of Big Data Analytics Native on Hadoop

DataWorks Summit

Lacking the technology to directly leverage Hadoop, some companies are foregoing its full benefits opting to treat Hadoop as just another data source for their legacy BI tools. But storage is only one benefit of Hadoop and ignores its linear scalability and data flexibility across all data types. Using Hadoop natively for both storage and computation in an analytic capacity has already led to dramatic increases in business benefits. Hadoop analytics has already identified over $2B in potential fraud at one of the world’s largest credit card companies. Sears has already reduced reporting times over traditional BI from 12 weeks to 3 days. A major internet security company increased customer conversion by 60% and revenue by $20 million. Meaningful returns are spread across Fortune 100 enterprises and fast growing startups with the common thread being self-service big data analytics leveraging Hadoop’s native capabilities. In this talk, we’ll highlight the core value proposition of building analytics natively on Hadoop, share real-world use cases that resulted in dramatic ROI, and reveal the next major step in visual big data analytics.

Cost of Ownership for Hadoop ImplementationDataWorks Summit

Viewers also liked (7)

Hadoop in the Microsoft Enterprise

Apache hadoop for windows server and windwos azure

Microsoft's Hadoop Story

Where to Deploy Hadoop: Bare Metal or Cloud?

The TCO Calculator - Estimate the True Cost of Hadoop

ROI of Big Data Analytics Native on Hadoop

Cost of Ownership for Hadoop Implementation

Similar to Hadoop acm presentation

Seattle Scalability - Sept Meetup

clive boulton

SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)

Sascha Dittmann

Building Big Data Solutions with Azure Data Lake.10.11.17.pptx

thando80

Big Data in the Microsoft PlatformJesus Rodriguez

Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...

Imam Raza

Differentiate Big Data vs Data Warehouse use cases for a cloud solution

James Serra

It can be quite challenging keeping up with the frequent updates to the Microsoft products and understanding all their use cases and how all the products fit together. In this session we will differentiate the use cases for each of the Microsoft services, explaining and demonstrating what is good and what isn't, in order for you to position, design and deliver the proper adoption use cases for each with your customers. We will cover a wide range of products such as Databricks, SQL Data Warehouse, HDInsight, Azure Data Lake Analytics, Azure Data Lake Store, Blob storage, and AAS as well as high-level concepts such as when to use a data lake. We will also review the most common reference architectures (“patterns”) witnessed in customer adoption.

Microsoft's Big Play for Big Data

Andrew Brust

Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012Andrew Brust

[Azureビッグデータ関連サービスとHortonworks勉強会] Azure HDInsight

Naoki (Neo) SATO

Hadoop in a Nutshell

Anthony Thomas

Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol

HARMAN Services

Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...

BigDataEverywhere

Big Data and NoSQL for Database and BI Pros

Andrew Brust

Uotm workshopRavi Patel

Accelerating Big Data Analytics

Attunity

5 Comparing Microsoft Big Data Technologies for Analytics

Jen Stirrup

Big Data and NoSQL for Database and BI Pros

Andrew Brust

USQL Trivadis Azure Data Lake Event

Trivadis

Bi on Big Data - Strata 2016 in London

Dremio Corporation

Hadoop in action

Mahmoud Yassin

Hadoop acm presentation

1. Hadoop and Microsoft. Brad Sarsfield | Senior Software Engineer @bradoop

3. How Big is Big Data?

4. It’s all about your Big Data Problems

5. Hadoop is for Big Data.

6. Data is the Platform.

7. Hadoop Data Science.

8. Hadoop Capabilities. Extract Load Distributed Transform Compute Predictive Machine Graph Analysis Learning Processing

9. Hadoop architecture. Distributed Processing (Map Reduce) Distributed Storage (HDFS)

10. Hadoop and Microsoft. Big engineering investment • Big Data Business Intelligence tooling • Big Data Apache Hadoop • Big Data Parallel Data Warehouse Open source Commitment • Apache Software Foundation • Hortonworks Partnership We are delivering • Apache Hadoop on Windows Server • Apache Hadoop on Windows Azure

11. Microsoft Hadoop Vision. Better on Windows and Azure • Active Directory • System Center Microsoft Data Connectivity • SQL Server / SQL Parallel Data Warehouse • Azure Storage / Azure Data Market Microsoft Business Intelligence (BI) • ODBC Connectivity

12. ACM Hackathon. Free Hadoop on Azure • Code: acmhackathon Free 30 day Azure account • No credit card • 750h small compute / 35GB storage • Email brad@bing.com for code Hadoop on Azure demo

Editor's Notes

Good afternoon. Thanks for coming, I know you're going to be really excited about this. I'm going to talk about Big Data, Hadoop and Microsoft It's just simply amazing to see the growing momentum around Big Data conversations happening today. Hadoop is changing the conversations that we have about Data, Big Data. I want to make sure we stay grounded in thinking through how to make money and save money with your Data using Hadoop.<next slide>
Let’s talk about size for a moment. The example I like to use is the US library of congress. The US library of congress has millions of books, recording, photographs, maps, music and manuscripts. All put together they have around 300TB of information. How much is that? That's 838 miles of bookshelves; If you were to stretch those out end to end, then go downstairs, get in your car and start driving at 65mph you'd hit the end of the books 13 hours later in New York City.A little over three times that is a petabyte. Microsoft is managing well over 100 Petabytes of data across our online properties. That single row of bookshelfs from New York to Jacksonville Florida is now half a mile high. That’s stunning.We are adding 7.5PBs per month of new data, running 20k analytic jobs per day to run our online services business.The good news is that hardware is fast and cheap enough that now we can record this data and consume it. This simply wasn’t possible a few years ago. Hard drive density and CPU power continue to double every 18 months.From the Microsoft point of view we have a pretty good understanding of how to build and operate one of these infrastructures and in the end connect it thru to developers and end users. We’re the only ones in the 100+Petabyte Club who also run an enterprise software and cloud business. I see the complete solution where we enable developers to build applications on this data; and connect them through to our end users with BI tools to deliver Breakthrough Big Data Insights. I will talk more about that in the Hadoop and Excel talk and take this down to a practical level in my second talk. This leads me onto the next concept.. It’s all about the data
It’s all about your Data, Actually, it’s all about your big dataIs it big as in Volume? Where your data exceeds limits of physical capabilities of systems today.Is it Velocity? The data is moving at a fast rate and value can decay over time.Is it Variability? of structure from unstructured, semi-structured to highly structured data.Doug Laney http://blogs.gartner.com/doug-laney/files/2012/01/ad949-3D-Data-Management-Controlling-Data-Volume-Velocity-and-Variety.pdfThe answer is it’s all of the above.Now that you have Big Data; you have two problems. You have BIG DATA problemsAndYou have big DATA PROBLEMS
The second thing I want to talk about is Hadoop and how Hadoop is setup to deliver Breakthrough Business Insights from your data.How many of you are familiar with Hadoop? How many of you are using Hadoop for projects today?How many are planning on using Hadoop in the next 12mo? How about in the cloud?When people talk about Hadoop they are often talking about specific computational patterns including map reduce, which emerged as a method to process lots of unstructured data on top of a distributed storage system in a highly fault tolerant and embarrassingly scalable way. Hadoop allows us to store and process large amounts of data on commodity hardware. In the past you would spend large amounts of money on very specialized hardware. Today you can do this with off the shelf hardware running Hadoop. Now, Hadoop doesn’t have a monopoly on “big”, “real time” or “unstructured” but does provide some unique capabilities.
It's everywhere to be mined, but we have what one can call "the pomegranate problem" Imagine all of your data being inside a pomegranate. When you eat a pomegranate it’s a bit difficult getting into all of the little pieces inside the pomegranate out, it's a bit of work.That’s the process that you need to go through to extract business insights out of your data.It’s useful to think of it in this way; where your data is the platform. Not the tooling that surrounds it.It’s all about the data.I’d like to share with you my favorite big data quotation from a famous Big Data philosopher.<next slide>
We don't have a Hadoop problem they have analytics, pattern mining, trend analysis, statistical inferenceing, economic modeling, market regression level problems. Big Data; in terms of data size, variability and velocity at scale are is the first problem. But the Big Data solutions and technology by themselves don't lead to solving business objectives. Data science starts where the utility class services like Big Data Hadoop end. The real opportunity is for Data science as a hosted petascale service ontop of cloud infrastructure. As powerful as Hadoop is, today it’s still more of a computer scientist’s or academically-trained analyst’s tool than it is an enterprise analytics product. Hadoop itself is controlled through programming code rather than anything that looks like it was designed for business unit personnel. Hadoop data is often more “raw” and “wild” than data typically fed to data warehouse and OLAP (Online Analytical Processing) systems. This is where I and Microsoft see opportunity. Essentially; wouldn't it be cool if mere mortals could use this stuff and consume insights that are directly coming from Hadoop?
I see the real breakthrough insights coming through when you take what is the traditional "Business Intelligence" and add more capabilities like machine learning, predictive analysis, statistical analysis, large scale graph processing, pattern mining, trend analysis, economic modeling. All of which today are a reality in Hadoop. The implications of this are quite astounding when you think about it. This is huge.

Hadoop acm presentation

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (7)

Similar to Hadoop acm presentation

Similar to Hadoop acm presentation (20)

Hadoop acm presentation

Editor's Notes