Addressing Enterprise Customer Pain Points with a Data Driven Architecture

•Download as POTX, PDF•

1 like•570 views

Customers that are implementing Big Data Analytics projects in enterprise environments driven by line of business applications are faced with the three critical issues of Managing Complexity, Data Movement and Replication, and Cloud Integration. In this session you will learn about the characteristics of these pain points and how designing and implementing a data driven approach enables enterprises to implement quickly and efficiently with a future proof architecture of hybrid cloud.

Technology

Data Driven with Data
Visionaries
© 2017 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---1

Are you linear or are you parallel?
© 2017 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---2

© 2017 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---3
Distributed
Dynamic
Diverse

Enterprise Data =
 Secure  Freedom
 Easy  Relevance
© 2017 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---
4

Enterprise Customer Pain Points
© 2017 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---5
Managing Complexity
Data Movement and Replication
Cloud Integration

A Linear Scenario
© 2017 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---6
Project 5
Project 1
Project 2 Project 4
Project 3

A Parallel Scenario
© 2017 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---7
Project 1
Project 4
Project 2
Project 3
Project 5
N e t A p p
D A T A F A B R I C

© 2017 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---8
ENTERPRISE IT NEXT-GEN DATA CENTER
N e t A p p
D A T A F A B R I C
MULTI-CLOUD
PUBLIC CLOUD
Manage Complexity
Data Movement and Replication
Cloud Integration

NetApp Data Fabric Example
NFS Connector
June 2017 © 2017 NetApp, Inc. All rights reserved. NETAPP CONFIDENTIAL INTERNAL USE ONLY9
Production Backup & DR
SnapMirror® SnapVault®
Your data is already
on ONTAP
And it’s protected by
ONTAP1 2
Run HDP In-Place on-Prem with ONTAP
and FAS NFS Connector for Hadoop3
NFS
SnapMirror®
NFS Connector
Hortonworks
Data Platform on
Provider’s IaaS
Hortonworks
Data Cloud
PaaS
or
S3
or
NetApp Private
Storage
SnapMirror®
NFS
Connector
NFS
Connector
Run HDP In-Place on NetApp Private
Storage with the FAS NFS Connector
for Hadoop on the cluster in the cloud
3
NetApp All Flash FAS NetApp All Flash FAS NetApp All Flash FAS
Analytics Cluster

© 2017 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---10
LINEAR
Node Centric
Outside
Step and Sequential
PARALLEL
Data Centric
Inside
Match and Move

Data Driven
Architecture
Enterprise Adoption
© 2017 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---11

Data is your most valuable business asset and it's also your biggest challenge. This challenge and opportunity means we continually face significant road blocks toward becoming a data driven organisation. From the management of data, to the bubbling open source frameworks, the limited industry skills to surmounting time and cost pressures, our challenge in data is big. We all want and need a “fit for purpose” approach to management of data, especially Big Data, and overcoming the ongoing challenges around the ‘3Vs’ means we get to focus on the most important V - ‘Value’.Come along and join the discussion on how Oracle Big Data Cloud provides Value in the management of data and supports your move toward becoming a data driven organisation. Speaker Noble Raveendran, Principal Consultant, Oracle

Enterprise large scale graph analytics and computing base on distribute graph...

DataWorks Summit

Graph approaches to structuring, analyzing data have been a significant area of interest, Graphs are well-suited to expressing complex interconnections and clusters of highly related entities. Large-scale graph analytics research is growing fast in recent years, to leverage Hadoop2 ecosystem for graph is a good approach, enterprise graph computer requires to store large graph and do fast computing against graph. One for the OLTP database systems which allow the user to query the graph in real-time, Hbase as the distributed NOSql database can be the backend storage to persistent large graph, the property graph stored its vertices and edges in key-value pairs in Hbase, it also provide highly reliable, scalable and fault tolerant to the data, Solr as the distributed indexing will make the query more efficient. Titan itself will handle cache, transaction; And another for the OLAP analytics systems, use TinkerPop hadoop gremlin SparkGraphComputer to processed a large graph, every vertex and edge is analyzed, a cluster-computing platform will help for the processing of large distributed in memory graph datasets. Graph DB base on Hbase/Solr and graph computing analysis base on spark is powerful for discovering valuable information about relationships in complex and large data, representing significant business opportunity in enterprise. It will help graph data analytics in a wide range of domains such as social networking, recommendation engines, advertisement optimization, knowledge representation, health care, education, and security.

High Performance Spatial-Temporal Trajectory Analysis with Spark

DataWorks Summit/Hadoop Summit

Data-In-Motion Unleashed

DataWorks Summit

Big Data at your Desk with KNIME

DataWorks Summit/Hadoop Summit

Hadoop Journey at Walgreens

DataWorks Summit

Prior to 2014, Walgreens has traditional Enterprise Datawarehouse Systems that have reached the capacity limits. Over the last three years we have evolved, learned lessons, experienced successes and failures. Our initial adoption of Hadoop came from the need to run complex analytics which simply did not scale on MPP RDBMS. Our business data demands were rapidly increasing and the 8 to 12 weeks concomitant extract, transform, and load turn around cycles was not a acceptable deliverable timeframe in the retail space. A self service model where data lands on a distributed platform, apply schema where necessary, and process at scale was a necessary paradigm for business value enablement. Our journey started with single use case which has now evolved to enterprise data hub. We will discuss following points: Evolution of our infrastructure profile, streamlining the hardware provisioning cycle, and our hybrid deployment model (on premise & cloud). Operations, how SmartSense has helped us proactively tune our cluster, and which operational tests we use for benchmarking the cluster. Monitoring, how we monitor and the tools required for enterprise grade monitoring. Security and governance how we progressed from non–compliance to enterprise grade using Ranger, Knox, Kerberos, HP voltage, encryption at rest, and many other services. 3rd Party integration with HDP, what we learned and how we overcame the challenges. Lastly, how we approach our disaster recovery strategy, what is driving the need for a DR and the key capabilities required.

The DAP - Where YARN, HBase, Kafka and Spark go to Production

DataWorks Summit/Hadoop Summit

Built-In Security for the Cloud

DataWorks Summit

Today enterprises desire to move more and more of their data lakes to the cloud to help them execute faster, increase productivity, drive innovation while leveraging the scale and flexibility of the cloud. However, such gains come with risks and challenges in the areas of data security, privacy, and governance. In this talk we cover how enterprises can overcome governance and security obstacles to leverage these new advances that the cloud can provide to ease the management of their data lakes in the cloud. We will also show how the enterprise can have consistent governance and security controls in the cloud for their ephemeral analytic workloads in a multi-cluster cloud environment without sacrificing any of the data security and privacy/compliance needs that their business context demands. Additionally, we will outline some use cases and patterns as well as best practices to rationally manage such a multi-cluster data lake infrastructure in the cloud. Speaker: Jeff Sposetti, Product Management, Hortonworks

Hadoop and Spark are big data frameworks used to extract useful span a variety of scenarios from ingestion, data prep, data management, processing, analyzing and visualizing data. Each step requires specialized toolsets to be productive. In this talk I will share solution examples in the Big Data ecosystem such as Cask, StreamSets, Datameer, AtScale, Dataiku on Microsoft’s Azure HDInsight that simplify your Big Data solutions. Azure HDInsight is a cloud Spark and Hadoop service for the enterprise and take advantage of all the benefits of HDInsight giving you the best of both worlds. Join this session for practical information that will enable faster time to insights for you and your business.

Securing your Big Data Environments in the Cloud

DataWorks Summit

Big Data tools are becoming a critical part of enterprise architectures and as such securing the data, at rest, and in motion is a necessity. More so, when you’re implementing these solutions in the cloud and the data doesn't reside within the confines of your trusted data center. Also, there is a fine balance between implementing enterprise-grade security and negotiating utmost performance given the overheads of encryption and/or identity management. This session is designed to tackle these challenges head on and explain the various options available in the cloud. The focal points are the implementation of tools like Ranger and Knox for cloud deployments, but we also pay attention to the security features offered in the cloud that complement this process and secure the data in unprecedented ways. Cloud Security + OSS Security tools are a deadly combination, when it comes to securing your Data Lake.

Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud

DataWorks Summit/Hadoop Summit

The world’s largest enterprises run their infrastructure on Oracle, DB2 and SQL and their critical business operations on SAP applications. Organisations need this data to be available in real-time to conduct necessary analytics. However, delivering this heterogeneous data at the speed it’s required can be a huge challenge because of the complex underlying data models and structures and legacy manual processes which are prone to errors and delays. Unlock these silos of data and enable the new advanced analytics platforms by attending this session. Find out how to: • To overcome common challenges faced by enterprises trying to access their SAP data • You can integrate SAP data in real-time with change data capture (CDC) technology • Organisations are using Attunity Replicate for SAP to stream SAP data in to Kafka

Scaling Data Science on Big Data

DataWorks Summit

Data science holds tremendous potential for organizations to uncover new insights and drivers of revenue and profitability. Big Data has brought the promise of doing data science at scale to enterprises, however this promise also comes with challenges for data scientists to continuously learn and collaborate. Data Scientists have many tools at their disposal such as notebooks like Juypter and Apache Zeppelin & IDEs such as RStudio with languages like R, Python, Scala and frameworks like Apache Spark. Given all the choices how do you best collaborate to build your model and then work through the development lifecycle to deploy it from test into production ? In this session learn the attributes of a modern data science platform that empowers data scientists to build models using all the data in their data lake and foster continuous learning and collaboration. We will show a demo of DSX with HDP with the focus on integration, security and model deployment and management. Speakers: Sriram Srinivasan, Senior Technical Staff Member, Analytics Platform Architect, IBM Vikram Murali, Program Director, Data Science and Machine Learning, IBM

Admiral Group

DataWorks Summit/Hadoop Summit

Big Data in the Cloud - The What, Why and How from the Experts

DataWorks Summit/Hadoop Summit

Hadoop has traditionally been an on-premises workload, with very few notable implementations on the cloud. With Organizations either having jumped on the cloud bandwagon or have started planning their expansion into the ecosystem, it is imperative for us to explore how Hadoop conforms to the cloud paradigm. With the coming off age of some very useful cloud paradigms and the nature of Big Data with high seasonality of workloads, this is becoming a very common ask from customers. Robust architectures, elastic scale, open platforms, OSS integrations, and addressing complex pain points will all be part of this lively talk. To be able to implement effective solutions for Big Data in the cloud it is imperative that you understand the core principles and grasp the design principles of how the cloud can enhance the benefits of parallelized analytics. Join this session to understand the nitty-gritties of implementing Big Data in the cloud and the various options therein. Big Data + Cloud is definitely a deadly combination.

GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...

DataWorks Summit

GeoWave is an open-source library that connects geospatial software with distributed computing frameworks. GeoWave leverages the scalability of a distributed key-value store for effective storage, retrieval, and analysis of massive geospatial datasets. It uses a space filling curve to preserve locality between multi-dimensional objects and the single dimensional sort order imposed by key-value stores. What this means to a user is that distributed spatial and spatial-temporal retrieval and analysis can be effectively accomplished at a massive scale. At its core, GeoWave solves the problem of multi-dimensional indexing, and particularly extends this capability to spatial/temporal use cases. GeoWave supports raster, vector, and point cloud data, and provides common spatial algorithms that can be extended to create deep analytic capabilities. It also performs fast subsampling via distributed rendering that integrates with GeoServer, so that a user can interactively visualize data at map scale regardless of density. Our goal in presenting GeoWave to the Hadoop Summit is to introduce it to the big data community. We will present GeoWave at a moderate level of detail, to include a short demonstration, and hopefully answer any questions regarding maturity, suitability and implementation details.

Verizon Centralizes Data into a Data Lake in Real Time for Analytics

DataWorks Summit

Verizon – Global Technology Services (GTS) was challenged by a multi-tier, labor-intensive process when trying to migrate data from disparate sources into a data lake to create financial reports and business insights. Join this session to learn more about how Verizon: • Easily accessed data from multiple sources including SAP data • Ingested data into major targets including Hadoop • Achieved real-time insights from data leveraging change data capture (CDC) technology • Reduced costs and labor

Big SQL: Powerful SQL Optimization - Re-Imagined for open source

DataWorks Summit

Let's be honest - there are some pretty amazing capabilities locked in proprietary SQL engines which have had decades of R&D baked into them. At this session, learn how IBM, working with the Apache community, has unlocked the value of their SQL optimizer for Hive, HBase, ObjectStore, and Spark - helping customers avoid lock-in while providing best performance, concurrency and scalability for complex, analytical SQL workloads. You'll also learn how the SQL engine was extended and integrated with Ambari, Ranger, YARN/Slider and HBase. We share the results of this project which has enabled running all 99 TPC-DS queries at world record breaking 100TB scale factor.

Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud

DataWorks Summit

Dynamic DDL: Adding structure to streaming IoT data on the fly

DataWorks Summit

At the end of day the only thing that data scientists want is one thing. They want tabular data for their analysis. They do not want to spend hours or days preparing data. How does a data engineer handle the massive amount of data that is being streamed at them from IoT devices and apps and at the same time add structure to it so that data scientists can focus on finding insights and not preparing data? By the way, you need to do this within minutes (sometimes seconds). Oh... and there are a bunch more data sources that you need to ingest and the current providers of data are changing their structure. At GoPro, we have massive amounts of heterogeneous data being streamed at us from our consumer devices and applications, and we have developed a concept of "dynamic DDL" to structure our streamed data on the fly using Spark Streaming, Kafka, HBase, Hive, and S3. The idea is simple. Add structure (schema) to the data as soon as possible. Allow the providers of the data to dictate the structure. And automatically create event-based and state-based tables (DDL) for all data sources to allow data scientists to access the data via their lingua franca, SQL, within minutes.

Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...

DataWorks Summit/Hadoop Summit

The challenge of computing big data for evolving digital business processes demands variety of computation techniques and engines (SQL, OLAP, time-series, graph, document store), but working in unified framework. A simple architecture of data transformations while ensuring the security, governance, and operational administration are the necessary critical components for enterprise production environments supporting day-to-day business processes. In this session, you will learn about best practices & critical components to ensure business value from latest production deployments. Hear how existing customers are using SAP Vora and the value they have achieved so far with this in-memory engine for distributed data processing. The session provides you with a clear understanding how SAP Vora and open source components like Apache Hadoop and Apache Spark offer an architecture that supports a wide variety of use cases and industries. You will also receive very useful insight where to find development resources, test drive demos, and general documentation.

How Apache Spark and Apache Hadoop are being used to keep banking regulators ...

DataWorks Summit

The global financial crisis showed that traditional IT systems at banks were ill equiped to monitor and manage the daily-changing risk landscape during the global financial crisis. The sheer amount of data that needed to be crunched meant that many of the banks were day(s) behind in calculating, understanding and reporting their risk positions. Post crisis, a review by banking regulator, led the regulators to introduce a new legislation BCBS 239: Principles for effective risk data aggregation and reporting, that requires banks to meet more stringent (timeliness) requirement, in their ability to aggregate and report on their quickly-changing risk positions or risk fines to the tune of $millions. To meet these new requirements, banks have been forced to re-think their traditional IT architectures, which are unable to cope with sheer volume of risk data, and are instead turning to Apache Hadoop and Apache Spark to build out next generation of risk systems. In this talk you will discover, how some of the leading banks in the world are leveraging Apache Hadoop and Apache Spark to meet BCBS 239 regulation. Speaker Kunal Taneja

Security, ETL, BI & Analytics, and Software Integration

DataWorks Summit

Liberty Mutual Enterprise Data Lake Use Case Study By building a data lake, Liberty Mutual Insurance Group Enterprise Analytics department has created a platform to implement various big data analytic projects. We will share our journey and how we leveraged Hortonworks Hadoop distribution and other open source technologies to meet our project needs. This session will cover data lake architecture, security, and use cases.

Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizon

DataWorks Summit/Hadoop Summit

Delivering Data Science to the Business

DataWorks Summit

DataWorks Summit 2017 - Sydney Keynote Madhu Kochar, Vice President, Analytics Product Development and Client Success, IBM Data science holds the promise of transforming businesses and disrupting entire industries. However, many organizations struggle to deploy and scale key technologies such as machine learning and deep learning. IBM will share how it is making data science accessible to all by simplifying the use of a range of open source technologies and data sources, including high performing and open architectures geared for cognitive workloads.

Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...

DataWorks Summit

Progressive Insurance is well known for its innovative use of data to better serve its customers, and the important role that Hortonworks Data Platform has played in that transformation. However, as with most things worth doing, the path to the Data Lake was not without its challenges. In this session, I’ll share our top use cases for Hadoop – including telematics and display ads, how a skills shortage turned supporting these applications into a nightmare, and how – and why – we now use Syncsort DMX-h to accelerate enterprise adoption by making it quick and easy (or faster and easier) to populate the data lake – and keep it up to date – with data from across the enterprise. I’ll discuss the different approaches we tried, the benefits of using a tool vs. open source, and how we created our Hadoop Ingestor app using Syncsort DMX-h.

Filling the Data Lake

DataWorks Summit/Hadoop Summit

Log I am your father

DataWorks Summit/Hadoop Summit

Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...

DataWorks Summit/Hadoop Summit

In 2015/16 Worldpay deployed it's Enterprise Data Platform - a highly secure cluster used for analysis of over 65 Billion card transactions and the subject of last years Hadoop Summit Keynote in Dublin. A year on and we are now rapidly expanding our platform with true multi-tenancy. For our first tenant we have build and deployed the analytics and reporting for our central platforms. Our second tenant is to deploy 'decision engines' into our core business systems. These allow Worldpay to make decisions derived from machine learning on how we authorise and route payments traffic and how these affect the consumer, merchant and other business partners. We are also developing other tenant for systems management and security. This talk will look at what it means to have truly have a single enterprise data lake and multiple tenants that share that data and look forward to how we will extend the platform in 2017 with Hadoop 3.

Recipe for Success: The Right Ingredients for Enterprise-Class Cloud Data Man...

Amazon Web Services

,When data is the lifeblood of your organisation, best in class data management and protection practices are a no-brainer. With NetApp and AWS, there are a wide menu of data services available to help with things like backup and disaster recovery, accelerating DevOps, data warehouses and analytics, and running high performance business applications to mention a few. With NetApp and AWS, how can you ensure that you don’t compromise on things like cost, performance, security or manageability? With the right cloud data management solution, why not have the best of both worlds and get ahead!In this session, learn what the secret sauce is to optimise the foundation of your cloud data management and how enterprise customers like Monash University and REA Group have been leveraging the economics of cloud for their needs.Yours Sincerely,NetApp, the Cloud Data Management Experts DP (Hons), PhD(DataFabric) Speakers: Tiedan Yu, Senior Storage Engineer, Monash University Jesse Pratt, Infrastructure Manager, REA Group Matt Moore, Hybrid Cloud Architect, NetApp Keiran McCartney, Alliances & Solutions Manager, NetApp

Bridging Your Business Across the Enterprise and Cloud with MongoDB and NetApp

MongoDB

What's hot

Build Big Data Enterprise solutions faster on Azure HDInsight

DataWorks Summit

Securing your Big Data Environments in the Cloud

DataWorks Summit

Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud

DataWorks Summit/Hadoop Summit

Scaling Data Science on Big Data

DataWorks Summit

Admiral Group

DataWorks Summit/Hadoop Summit

Big Data in the Cloud - The What, Why and How from the Experts

DataWorks Summit/Hadoop Summit

GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...

DataWorks Summit

Verizon Centralizes Data into a Data Lake in Real Time for Analytics

DataWorks Summit

Big SQL: Powerful SQL Optimization - Re-Imagined for open source

DataWorks Summit

Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud

DataWorks Summit

Dynamic DDL: Adding structure to streaming IoT data on the fly

DataWorks Summit

Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...

DataWorks Summit/Hadoop Summit

How Apache Spark and Apache Hadoop are being used to keep banking regulators ...

DataWorks Summit

Security, ETL, BI & Analytics, and Software Integration

DataWorks Summit

Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizon

DataWorks Summit/Hadoop Summit

Delivering Data Science to the Business

DataWorks Summit

Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...

DataWorks Summit

Filling the Data Lake

DataWorks Summit/Hadoop Summit

Log I am your father

DataWorks Summit/Hadoop Summit

Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...

DataWorks Summit/Hadoop Summit

What's hot (20)