Scaling Data overview

•Download as PPT, PDF•

0 likes•459 views

Wade Malone

Overview of how Scaling Data addresses the need for organizations need for Hadoop to gain advantages in data analysis.

Technology

Confidential and Proprietary of Scaling Data All Rights Reserved
2
Scaling Data introduction
What is “Big Data”
Hadoop Capabilities and Uses
Hadoop and its use in Analytics
SSN overview and analytics direction
Next steps

Confidential and Proprietary of Scaling Data All Rights Reserved
3
• Partnership comprised of seasoned Big Data, Hadoop,
financial services, and security entrepreneurs
• Focused on extracting value from ALL your data
• Services include:
 Data Discovery Assessments
 Strategy Development
 Hadoop Implementation
 Hosted Hadoop Environment
 Advanced Analytics Development

4
FLEXIBILITYFLEXIBILITY
Commoditization ofCommoditization of
Distributed ComputingDistributed Computing
SCALABILITYSCALABILITY
Distributed Data ProcessingDistributed Data Processing
Competitive AdvantageCompetitive Advantage
SECURITYSECURITY
Hardened ServersHardened Servers
World-Class EncryptionWorld-Class Encryption
Confidential and Proprietary of Scaling Data All Rights Reserved

5
• Scaling Data focuses on Big Data problems in
the financial services arena.
• We provide data discovery, capture, analysis
and strategies that allow organizations to better
leverage ALL current and historical data beyond
traditional relational and BI limitations
• Hadoop Hosting
Confidential and Proprietary of Scaling Data All Rights Reserved

6
Scaling Data solutions focus on following Big Data
industries :
•Financial Services
− Security/AML/Fraud
− Payments Analysis
•Retail
− Spend Analysis
− Pricing Optimization
• Telecom and Utilities
− Smart Grid Analysis
− Pricing Optimization
Confidential and Proprietary of Scaling Data All Rights Reserved

7
Optimized
Performance and
Results
Data
Discovery
Source
Analysis
Evolution of Using
Big Data
Hadoop Cluster
Design,
Implementation &
Hosting
Opportunity
Identification
Advanced Big Data
Analytics
Confidential and Proprietary of Scaling Data All Rights Reserved

8Confidential and Proprietary of Scaling Data All Rights Reserved
Relational Databases:
• ACID system
• Stores Tables (Schema)
• Stores single digit terabytes
• Processes GB’s per query
• SQL
• Interactive response
• Low latency
Hadoop:
• A distributed operating system for
data analysis
• Stores Files (Structured and
Unstructured)
• Stores dozens of petabytes
• Queries & Data Processing
• Batch response (>30 sec)
• HBase allows for low latency queries
but you lose SQL
Hadoop is good for storing and processing large amounts of unstructured or
structured data in batch form
HBase is the tool to use for petabyte size, low latency applications

9
Companies that use Hadoop can expect the following:
•70% are more confident in their ability to mange large data
•88% can perform more analysis on large data
•88% can keep more historical records
•94% can analyze data in greater detail
•82% can capture and use all source data
Source: Ventana Research
Confidential and Proprietary of Scaling Data All Rights Reserved

10
data
data
data
data
data
data
data
data
data
data
Bring the function to the data
Traditional RDBMS
Traditional SAN
Distributed
Data
Stores
Confidential and Proprietary of Scaling Data All Rights Reserved

11
AppsApps
DataIngest-Cascading
Analytics
Store
Presentation
Store
Data
Adapter
Data
Adapter
Data
Adapter
Data
Adapter
Confidential and Proprietary of Scaling Data All Rights Reserved

12
• Efficiently execute sophisticated
analytics
Supports real-time transaction processing;
handle thousands of transactions a second.
Leverage the platform’s comprehensive
range of analytic capabilities.
• Leverage packaged capabilities and
open analytics
Balance the need for proven, off-the-shelf
analytics with the capability to develop new
rules / models with easy-to-use graphical
tools.
• Drive process efficiencies
Automate and streamline investigations –
with alert generation and comprehensive
workflow and investigation management.
• Adapt to changing organizational
needs
Adapt logic, processing and policies with
user-friendly controls and tools, with and
without IT support. New solutions can
easily be deployed on the common
platform to meet changing business
needs.
• Yield faster returns
Proven, out-of-the-box analytics detect and
prevent issues immediately. Speed
implementation with flexible data
mapping to legacy environments and a
data source agnostic architecture.
Confidential and Proprietary of Scaling Data All Rights Reserved

13
info@scalingdata.com
Confidential and Proprietary of Scaling Data All Rights Reserved

Across all industries, organizations are embracing the promise of Apache Hadoop to store and analyze data of all types, at larger volumes than ever before possible. But to tap into the true value of this data, organizations need to manage this data and its subsequent metadata to understand its context, see how it’s changing, and take actions on it. Cloudera Navigator is the only integrated data management and governance for Hadoop and is designed to do exactly this. With Cloudera 5.7, we have further expanded the capabilities in Cloudera Navigator to make it even easier to understand your data and maintain metadata consistency as it moves through Hadoop.

Big Data Business Wins: Real-time Inventory Tracking with Hadoop

DataWorks Summit

MetaScale is a subsidiary of Sears Holdings Corporation that provides big data technology solutions and services focused on Hadoop. It helped Sears implement a real-time inventory tracking system using Hadoop and Cassandra to create a single version of inventory data across different legacy systems. This allowed inventory levels to be updated in real-time from POS data, reducing out-of-stocks and improving the customer experience.

Enterprise Data Hub: The Next Big Thing in Big Data

Cloudera, Inc.

If you missed Strata + Hadoop World, you missed quite a bit. This year's event was packed with Big Data practitioners across industries who shared their experiences and how they are driving new innovations like never before. Just because you weren't there, doesn't mean you missed out. In this session, we'll touch on a few of the key highlights from the show, including: Key trends in Big Data adoption The enterprise data hub How the enterprise data hub is used in practice

Big Data Governance in Hadoop Environments with Cloudera Navigatorfeb2017meetu

Emre Sevinç

This document discusses big data governance with Cloudera Navigator. It begins with an introduction to data governance and why it is important. It then introduces Cloudera Navigator, which provides unified auditing, comprehensive lineage, unified metadata, and universal policies for data governance. The presentation demonstrates Cloudera Navigator's features for lineage, metadata tagging, and auditing. It concludes by covering new features in Cloudera Navigator for cloud data governance and improved performance and usability.

Bloor Research & DataStax: How graph databases solve previously unsolvable bu...

DataStax

This webinar covered graph databases and how they can solve problems that were previously difficult for traditional databases. It included presentations on why graph databases are useful, common use cases like recommendations and network analysis, different types of graph databases, and a demonstration of the DataStax Enterprise graph database. There was also a question and answer session where attendees could ask about graph databases and DataStax Enterprise graph.

Is your big data journey stalling? Take the Leap with Capgemini and Cloudera

Cloudera, Inc.

Building trust in your data lake. A fintech case study on automated data disc...

DataWorks Summit

This talk talks through learning from the HDP implementation at G-Research, a leading Fin-Tech company based in London. The team at G-Research implemented the Hortonworks Data Platform to build a data lake and enable the business team to build analytics and machine learning tools. The team faced challenges to accurately control and manage any sensitive data. Business teams were not able to search through data due to lack of data classification. G-Research implemented Privacera auto-discovery solution to precisely discover and tag data as it is ingested into the HDP environment. The tags are pushed to Apache Atlas and then Apache Ranger for enabling tag based policies. The G-Research team also build custom tools to push Spark lineage information into Atlas. Finally, Privacera monitoring tools continuously analyzed access audit information to alert if sensitive data is moved to folders that might not be protected. Consequently, security team got real visibility into the sensitive data. Also, business users could search and find the data within appropriate data classification in place. Speakers Balaji Ganesan, Co-Founder and CEO, Privacera Alberto Romero, Big Data Architect, G-Research

Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...

Cloudera, Inc.

PRGX is the world's leading provider of accounts payable audit services and works with leading global retailers. As new forms of data started to flow into their organizations, standard RDBMS systems were not allowing them to scale. Now, by using Talend with Cloudera Enterprise, they are able to acheive a 9-10x performance benefit in processing data, reduce errors, and now provide more innovative products and services to end customers. Watch this webinar to learn how PRGX worked with Cloudera and Talend to create a high-performance computing platform for data analytics and discovery that rapidly allows them to process, model, and serve massive amount of structured and unstructured data.

This document discusses best practices for using Hadoop as an enterprise data hub. It provides an overview of how big data is driving new analytical workloads and the need for deeper customer insights. It discusses challenges with analyzing new sources of structured, unstructured and multi-structured data. It introduces the concept of a Hadoop enterprise data hub and data refinery to simplify access to new insights from big data. Key components of the data hub include a data reservoir to capture raw data from various sources, a data refinery to cleanse and transform the data, and publishing high value insights to data warehouses and other systems.

Rethink Analytics with an Enterprise Data Hub

Cloudera, Inc.

Top 5 Considerations for a Big Data Solution

DataStax

Hardening Hadoop for Healthcare with Project Rhino

Amazon Web Services

Big Data analytics is estimated to save over $450B in healthcare costs, and there is exciting adoption of big data platforms with healthcare payers and providers. Hadoop and cloud computing have emerged as one of the most promising technologies for implementing big data at scale for healthcare workloads in production, using Hadoop as a service. Common considerations in the healthcare industry include privacy and data security, and the challenges of regulatory compliance with HIPAA and HITECH. Intel provides a security framework for Hadoop that enables enterprises to deploy big data analytics without compromising performance or security. Intel is contributing to a common security framework for Apache Hadoop, in the form of Project Rhino, which enables Hadoop to run workloads without compromising performance or security. Join this session to learn how your enterprise can take advantage of the security capabilities in the Intel Data Platform running on AWS to analyze healthcare data while ensuring technical safeguards that help you remain in compliance.

Starting the Hadoop Journey at a Global Leader in Cancer Research

DataWorks Summit/Hadoop Summit

MD Anderson Cancer Center implemented Hadoop to help manage and analyze big data as part of its big data program. The implementation included building Hadoop clusters to store and process structured and unstructured data from various sources. Lessons learned included that implementing Hadoop is complex and a journey, and to leverage existing strengths, collaborate openly, learn from experts, start with one cluster for multiple uses cases, and follow best practices. Next steps include expanding the Hadoop platform, ingesting more data types, identifying high value use cases, and developing and training people with new big data skills.

Breakout: Data Discovery with Hadoop

Cloudera, Inc.

Siloed data is difficult to access and causes data consumers to only have partial views of the problem at hand. By limiting access to large volumes of disparate data, analysts and business users alike don’t have the ability to included important data in their reports and models leading to suboptimal analytic outputs. Even when this data is available to countless users, traditional systems limit them to querying small volumes of data in order to return the results in a timely matter.

Hortonworks Hybrid Cloud - Putting you back in control of your data

Scott Clinton

The document discusses Hortonworks' solutions for managing data across hybrid cloud environments. It proposes getting all data under management, combating growing cloud data silos, and consistently securing and governing data across locations. Hortonworks offers the Hortonworks Data Platform, Hortonworks Dataflow, and Hortonworks DataPlane to provide a modern hybrid data architecture with cloud-native capabilities, security and governance, and the ability to extend to edge locations. The document also highlights Hortonworks' professional services and open source community initiatives around hybrid cloud data.

Necessity of Data Lakes in the Financial Services Sector

DataWorks Summit

With the emergence of regulations such as the General Data Protection Regulation from the European Union (effective May 2018), with fines up to 20m Euro, Data Lakes are emerging as the data architecture of choice amongst financial institutions. Banks are embarking on a journey to enable data scientists to unlock the value of the data silo'ed in many disparate data systems. By enabling self service data access and merging multiple streams of data by using data clustering, entity extraction, identity resolution and other techniques - we will show how banks have used Analytics to uncover business value without falling into the abyss of data swamps. The build out of the data lake requires the ingestion of data from multiple operational systems . By leveraging an automated Data Cataloging service, organizations are able to search, profile, discover, tag, track lineage and capture tribal knowledge delivered on the FICO Analytics Cloud enabling the data scientists to build innovative models, make automated decisions, track fraudulent usage, make intelligent marketing campaigns and improve the top line and bottom line for the financial institution. Speaker: Rohit Valia, Product Management and Strategy, Fico

Limitless Data, Rapid Discovery, Powerful Insight: How to Connect Cloudera to...

Cloudera, Inc.

What if… …your data stores were limitless and accessible? …data discovery was fast… really fast? …connectivity was so seamless you could almost take it for granted? And what if you could do all this with your preferred BI tool? Learn how to integrate Cloudera Enterprise with SAP Lumira via embedded connectivity from Simba Technologies. In this interactive webinar, experts from Cloudera, SAP, and Simba Technologies will introduce strategies for overcoming current data-discovery challenges, show you how to achieve powerful analytical insight, and demonstrate how to integrate Cloudera Enterprise with SAP Lumira.

Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...

Seeling Cheung

Citizens Bank was implementing a BigInsights Hadoop Data Lake with PureData System for Analytics to support all internal data initiatives and improve the customer experience. Testing BigInsights on the ViON Hadoop Appliance yielded the productivity, maintenance, and performance Citizens was looking for. Citizens Bank moved some analytics processing from Teradata to Netezza for better cost and performance, implemented BigInsights Hadoop for a data lake, and avoided large capital expenditures for additional Teradata capacity.

Webinar: Comparing DataStax Enterprise with Open Source Apache Cassandra

DataStax

Apache Cassandra is the open source database technology that pioneered distributed data at scale. DataStax Enterprise, powered by the best distribution of Apache Cassandra, gives you up to 2x better compaction throughput, 3x better operational analytics performance, ease-of-use, and a secure, comprehensive multi-model data platform including search and operational analytics integrated with Cassandra to help you take on whatever challenges you might face along the way. View recording: https://youtu.be/qLJyFydE-uY Explore all DataStax webinars: http://www.datastax.com/resources/webinars

How Cloudera SDX can aid GDPR compliance 6.21.18

Cloudera, Inc.

Big data solutions from Cloudera can help organizations comply with the GDPR in three main ways: 1) Provide comprehensive encryption, access controls, and auditing to satisfy principles around integrity, confidentiality, and accountability. 2) Track the classification, usage, and lineage of personal data to demonstrate lawfulness, fairness, and transparency. 3) Enable capabilities like fast data updates, redaction, and erasure of individual records to comply with principles regarding purpose limitation, data minimization, accuracy, and storage limitation.

Cloudera Breakfast Series, Analytics Part 1: Use All Your Data

Cloudera, Inc.

The document discusses how traditional analytics processes involve siloed data and platforms, long timelines for data discovery, and difficulties accessing and sharing data. It proposes that an Enterprise Data Hub (EDH) using Cloudera can help address these issues by providing unified storage for all types of data, shorter analytics lifecycles, and the ability to do more with data by using 100x more data and more types of data. The EDH allows organizations to use all of their data and gain insights sooner.

Cloudera showcase c5.4

Cloudera, Inc.

This document discusses the key updates and focus areas for Cloudera's upcoming C5.4 release, including improvements to data governance, open standards support, platform support, core scalability, and enterprise security. Some highlights include expanded data lineage tracking, support for new cloud platforms, performance optimizations, and integration with xPlain.io for data modeling and query troubleshooting. The release will also include updates to core components like HDFS, HBase, Hive, Impala and Spark to improve scalability, stability, and production readiness.

A Modern Data Strategy for Precision Medicine

Cloudera, Inc.

Genomics is upon us, made possible by big data and the technologies designed to support it. Doctors, who historically used clinical data, and researchers, who historically used genomic data, are now increasingly focused on analyzing the same single data set: introducing the opportunity to share bodies of knowledge, fostering collaborative innovation, and driving toward higher standards of care. However, this data is enormous – volumes of genomic data are expected to reach two to four exabytes per year by 2025, yet the cost of genetic sequencing has decreased 100-fold over the past 10 years. Cloudera is helping solve the big data problem with its Apache Hadoop-based platform for large-scale data processing, discovery, and analytics; putting precision medicine within reach.

Engaging with Cloudera & Morning Wrap Up

Cloudera, Inc.

This document discusses Cloudera's training, services, and support offerings for Hadoop and big data. It provides an overview of Cloudera University for role-based training courses, professional certifications, and e-learning. It also describes options for on-demand, virtual live classroom, private on-site, and public live classroom training. Additional sections outline Cloudera's professional services for optimizing Hadoop implementations at every stage and dedicated support engineers for federal customers.

Comprehensive solutions for data integration and advanced analytics

Gauss Algorithmic

Gauss Algorithmic provides comprehensive data integration and advanced analytics solutions using best-in-class open source technologies. They help businesses analyze their data to find answers to important questions through services like data integration, building big data infrastructures, data analytics using machine learning and AI, and data monetization. Their team of over 17 data and analytics experts build customized solutions on Cloudera's data platform and partner with companies in related fields.

2016 Cybersecurity Analytics State of the Union

Cloudera, Inc.

Five steps to getting maximum value from Real World Data

Saama

Fiducia & GAD IT AG: From Fraud Detection to Big Data Platform: Bringing Hado...

Seeling Cheung

The document summarizes the experience of Fiducia & GAD IT AG in bringing Hadoop to their enterprise for fraud detection purposes. They faced challenges of handling high volumes of transaction data in real-time for model-based fraud evaluation. Their solution was to implement an Apache Hadoop platform to address the velocity, variety and volume of transaction data. Key lessons learned included that Hadoop is a complex platform requiring new skills, ongoing support is critical, and standard tasks can generate significant effort. Their blueprint recommends starting with a simple use case, few components, agile development, and budgeting time for training and bug fixing when establishing a big data platform.

Hitachi Data Systems Hadoop Solution

Hitachi Vantara

Hitachi Data Systems Hadoop Solution. Customers are seeing exponential growth of unstructured data from their social media websites to operational sources. Their enterprise data warehouses are not designed to handle such high volumes and varieties of data. Hadoop, the latest software platform that scales to process massive volumes of unstructured and semi-structured data by distributing the workload through clusters of servers, is giving customers new option to tackle data growth and deploy big data analysis to help better understand their business. Hitachi Data Systems is launching its latest Hadoop reference architecture, which is pre-tested with Cloudera Hadoop distribution to provide a faster time to market for customers deploying Hadoop applications. HDS, Cloudera and Hitachi Consulting will present together and explain how to get you there. Attend this WebTech and learn how to: Solve big-data problems with Hadoop. Deploy Hadoop in your data warehouse environment to better manage your unstructured and structured data. Implement Hadoop using HDS Hadoop reference architecture. For more information on Hitachi Data Systems Hadoop Solution please read our blog: http://blogs.hds.com/hdsblog/2012/07/a-series-on-hadoop-architecture.html

Data Governance for Data Lakes

Kiran Kamreddy

Hadoop based data Lakes have become increasingly popular within today’s modern data architectures for their ability to scale, handle data variety and low cost. Many organizations start slow with the data lake initiatives but as they grow bigger, they suffer with challenges on data consistency, quality and security, resulting in losing confidence in their data lake initiatives. This talk will discuss the need for good data governance mechanisms for Hadoop data lakes and it relationship with productivity and how it helps organizations meet regulatory and compliance requirements. The talk advocates carrying a different mindset for designing and implementing flexible governance mechanisms on Hadoop data lakes.

What's hot

MapR Enterprise Data Hub Webinar w/ Mike Ferguson

MapR Technologies

Rethink Analytics with an Enterprise Data Hub

Cloudera, Inc.

Top 5 Considerations for a Big Data Solution

DataStax

Hardening Hadoop for Healthcare with Project Rhino

Amazon Web Services

Starting the Hadoop Journey at a Global Leader in Cancer Research

DataWorks Summit/Hadoop Summit

Breakout: Data Discovery with Hadoop

Cloudera, Inc.

Hortonworks Hybrid Cloud - Putting you back in control of your data

Scott Clinton

Necessity of Data Lakes in the Financial Services Sector

DataWorks Summit

Limitless Data, Rapid Discovery, Powerful Insight: How to Connect Cloudera to...

Cloudera, Inc.

Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...

Seeling Cheung

Webinar: Comparing DataStax Enterprise with Open Source Apache Cassandra

DataStax

How Cloudera SDX can aid GDPR compliance 6.21.18

Cloudera, Inc.

Cloudera Breakfast Series, Analytics Part 1: Use All Your Data

Cloudera, Inc.

Cloudera showcase c5.4

Cloudera, Inc.

A Modern Data Strategy for Precision Medicine

Cloudera, Inc.

Engaging with Cloudera & Morning Wrap Up

Cloudera, Inc.

Comprehensive solutions for data integration and advanced analytics

Gauss Algorithmic

2016 Cybersecurity Analytics State of the Union

Cloudera, Inc.

Five steps to getting maximum value from Real World Data

Saama

Fiducia & GAD IT AG: From Fraud Detection to Big Data Platform: Bringing Hado...

Seeling Cheung

What's hot (20)

MapR Enterprise Data Hub Webinar w/ Mike Ferguson

Rethink Analytics with an Enterprise Data Hub

Top 5 Considerations for a Big Data Solution

Hardening Hadoop for Healthcare with Project Rhino

Starting the Hadoop Journey at a Global Leader in Cancer Research

Breakout: Data Discovery with Hadoop

Hortonworks Hybrid Cloud - Putting you back in control of your data

Necessity of Data Lakes in the Financial Services Sector

Limitless Data, Rapid Discovery, Powerful Insight: How to Connect Cloudera to...

Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...

Webinar: Comparing DataStax Enterprise with Open Source Apache Cassandra

How Cloudera SDX can aid GDPR compliance 6.21.18

Cloudera Breakfast Series, Analytics Part 1: Use All Your Data

Cloudera showcase c5.4

A Modern Data Strategy for Precision Medicine

Engaging with Cloudera & Morning Wrap Up

Comprehensive solutions for data integration and advanced analytics

2016 Cybersecurity Analytics State of the Union

Five steps to getting maximum value from Real World Data

Fiducia & GAD IT AG: From Fraud Detection to Big Data Platform: Bringing Hado...

Similar to Scaling Data overview

Hitachi Data Systems Hadoop Solution

Hitachi Vantara

Data Governance for Data Lakes

Kiran Kamreddy

Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...

MapR Technologies

In this webinar, Carl W. Olofson, Research Vice President, Application Development and Deployment for IDC, and Dale Kim, Director of Industry Solutions for MapR, will provide an insightful outlook for Hadoop in 2015, and will outline why enterprises should consider using Hadoop as a "Decision Data Platform" and how it can function as a single platform for both online transaction processing (OLTP) and real-time analytics.

Bridging the Big Data Gap in the Software-Driven World

CA Technologies

Implementing and managing a Big Data environment effectively requires essential efficiencies such as automation, performance monitoring and flexible infrastructure management. Discover new innovations that enable you to manage entire Big Data environments with unparalleled ease of use and clear enterprise visibility across a variety of data repositories. To learn more about Mainframe solutions from CA Technologies, visit: http://bit.ly/1wbiPkl

The practice of big data - making big data approachable

kcmallu

Big data Question bank.pdf

Sitamarhi Institute of Technology

Big data is a field that deals with large and complex datasets that cannot be processed by traditional methods. It has characteristics including volume, variety, velocity, variability, and veracity. Hadoop is an open-source software framework for distributed storage and processing of big data using MapReduce and HDFS. Common big data platforms include Hadoop, Cloudera, Amazon Web Services, Hortonworks, and MapR, which integrate tools for storage, analysis, and management of large datasets.

Building a Modern Analytic Database with Cloudera 5.8

Cloudera, Inc.

This document discusses building a modern analytic database with Cloudera. It outlines Marketing Associates' evaluation of solutions to address challenges around managing massive and diverse data volumes. They selected Cloudera Enterprise to enable self-service BI and real-time analytics at lower costs than traditional databases. The solution has provided scalability, cost savings of over 90%, and improved security and compliance. Future roadmaps for Cloudera's analytic database include faster SQL, improved multitenancy, and deeper BI tool integration.

Cisco Big Data Warehouse Expansion Featuring MapR Distribution

Appfluent Technology

The document discusses Cisco's Big Data Warehouse Expansion solution featuring MapR Distribution including Apache Hadoop. The solution reduces data warehouse management costs by enabling organizations to store and analyze more data at lower costs. It does this by offloading infrequently used data from the existing data warehouse to low-cost big data stores running on Cisco UCS hardware optimized for MapR Distribution. This provides benefits like enhanced analytics, improved performance, reduced costs and risks, and competitive advantages from being able to utilize more company data assets.

Hadoop and the Data Warehouse: When to Use Which

DataWorks Summit

In recent years, Apache™ Hadoop® has emerged from humble beginnings to disrupt the traditional disciplines of information management. As with all technology innovation, hype is rampant, and data professionals are easily overwhelmed by diverse opinions and confusing messages. Even seasoned practitioners sometimes miss the point, claiming for example that Hadoop replaces relational databases and is becoming the new data warehouse. It is easy to see where these claims originate since both Hadoop and Teradata® systems run in parallel, scale up to enormous data volumes and have shared-nothing architectures. At a conceptual level, it is easy to think they are interchangeable, but the differences overwhelm the similarities. This session will shed light on the differences and help architects, engineering executives, and data scientists identify when to deploy Hadoop and when it is best to use MPP relational database in a data warehouse, discovery platform, or other workload-specific applications. Two of the most trusted experts in their fields, Steve Wooledge, VP of Product Marketing from Teradata and Jim Walker of Hortonworks will examine how big data technologies are being used today by practical big data practitioners.

What is Hadoop & its Use cases-PromtpCloud

PromptCloud

Hadoop and SQL: Delivery Analytics Across the Organization

Seeling Cheung

This document summarizes a presentation given by Nicholas Berg of Seagate and Adriana Zubiri of IBM on delivering analytics across organizations using Hadoop and SQL. Some key points discussed include Seagate's plans to use Hadoop to enable deeper analysis of factory and field data, the evolving Hadoop landscape and rise of SQL, and a performance comparison showing IBM's Big SQL outperforming Spark SQL, especially at scale. The document provides an overview of Seagate and IBM's strategies and experiences with Hadoop.

Meet the Infochimps Platform

Infochimps, a CSC Big Data Business

This document outlines Infochimps' big data solutions. It discusses common big data problems around scaling, time, reliability, efficiency, staffing and data sourcing. It then describes Infochimps' platform which uses technologies like Ironfan, Wukong and partners to provide data infrastructure, analytics and a marketplace. Services include implementation, hosting, support and consulting. Infochimps differentiates itself by offering a complete solution while leveraging data augmentation and expertise to address clients' big data challenges.

Using hadoop for enterprise data management

Estuate, Inc.

BAR360 open data platform presentation at DAMA, Sydney

Sai Paravastu

Sai Paravastu discusses the benefits of using an open data platform (ODP) for enterprises. The ODP would provide a standardized core of open source Hadoop technologies like HDFS, YARN, and MapReduce. This would allow big data solution providers to build compatible solutions on a common platform, reducing costs and improving interoperability. The ODP would also simplify integration for customers and reduce fragmentation in the industry by coordinating development efforts.

Unlock Big Data's Potential in Financial Services with Hortonworks

Pactera_US

Pactera and Hortonworks introduce their partnership and Hortonworks' approach to enterprise Hadoop. They discuss how financial institutions can use big data and a polyglot approach to gain insights from various data types for applications like fraud detection, gaining a 360 degree view of customers, and risk analysis. Specific use cases discussed include using big data for insurance underwriting, website optimization, and getting a holistic view of customer interactions. Pactera then outlines its big data capabilities and how it can help clients through workshops, proofs of concept, and implementation.

Govern This! Data Discovery and the application of data governance with new s...

Cloudera, Inc.

Hadoop and Big Data Analytics | Sysfore

Sysfore Technologies

This document discusses big data and Hadoop. It defines big data as high volume data that cannot be easily stored or analyzed with traditional methods. Hadoop is an open-source software framework that can store and process large data sets across clusters of commodity hardware. It has two main components - HDFS for storage and MapReduce for distributed processing. HDFS stores data across clusters and replicates it for fault tolerance, while MapReduce allows data to be mapped and reduced for analysis.

The Future of Data Management: The Enterprise Data Hub

Cloudera, Inc.

The document discusses the enterprise data hub (EDH) as a new approach for data management. The EDH allows organizations to bring applications to data rather than copying data to applications. It provides a full-fidelity active compliance archive, accelerates time to insights through scale, unlocks agility and innovation, consolidates data silos for a 360-degree view, and enables converged analytics. The EDH is implemented using open source, scalable, and cost-effective tools from Cloudera including Hadoop, Impala, and Cloudera Manager.

Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...

Hortonworks

Lowering the entry point to getting going with Hadoop and obtaining business ...

DataWorks Summit

SAS is a leader in advanced analytics with over 40 years of experience. They provide tools to manage, explore, develop models, and deploy analytics from, with, and within Hadoop. This allows customers to realize value from Hadoop throughout the entire analytics lifecycle. SAS helps address challenges like Hadoop skills shortages and tools not being optimized for big data. They demonstrated identifying reasons for abandoned shopping carts using Hadoop and SAS analytics tools.

Similar to Scaling Data overview (20)

Hitachi Data Systems Hadoop Solution

Data Governance for Data Lakes

Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...

Bridging the Big Data Gap in the Software-Driven World

The practice of big data - making big data approachable

Big data Question bank.pdf

Building a Modern Analytic Database with Cloudera 5.8

Cisco Big Data Warehouse Expansion Featuring MapR Distribution

Hadoop and the Data Warehouse: When to Use Which

What is Hadoop & its Use cases-PromtpCloud

Hadoop and SQL: Delivery Analytics Across the Organization

Meet the Infochimps Platform

Using hadoop for enterprise data management

BAR360 open data platform presentation at DAMA, Sydney

Unlock Big Data's Potential in Financial Services with Hortonworks

Govern This! Data Discovery and the application of data governance with new s...

Hadoop and Big Data Analytics | Sysfore

The Future of Data Management: The Enterprise Data Hub

Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...

Lowering the entry point to getting going with Hadoop and obtaining business ...

Recently uploaded

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Aggregage

みなさんこんにちはこれ何文字まで入るの？40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの？えこ...

名前です男

Microsoft - Power Platform_G.Aspiotis.pdf

Uni Systems S.M.S.A.

“I’m still / I’m still / Chaining from the Block”

Claudio Di Ciccio

20 Comprehensive Checklist of Designing and Developing a Website

Pixlogix Infotech

Dive into the world of Website Designing and Developing with Pixlogix! Looking to create a stunning online presence? Look no further! Our comprehensive checklist covers everything you need to know to craft a website that stands out. From user-friendly design to seamless functionality, we've got you covered. Don't miss out on this invaluable resource! Check out our checklist now at Pixlogix and start your journey towards a captivating online presence today.

Full-RAG: A modern architecture for hyper-personalization

Zilliz

Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.

20240605 QFM017 Machine Intelligence Reading List May 2024

Matthew Sinclair

20240607 QFM018 Elixir Reading List May 2024

Matthew Sinclair

How to use Firebase Data Connect For Flutter

Daiki Mogmet Ito

UiPath Test Automation using UiPath Test Suite series, part 5

DianaGray10

Uni Systems Copilot event_05062024_C.Vlachos.pdf

Uni Systems S.M.S.A.

Climate Impact of Software Testing at Nordic Testing Days

Kari Kakkonen

My slides at Nordic Testing Days 6.6.2024 Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.

Presentation of the OECD Artificial Intelligence Review of Germany

innovationoecd

Securing your Kubernetes cluster_ a step-by-step guide to success !

KatiaHIMEUR1

Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster. However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks. In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.

Cosa hanno in comune un mattoncino Lego e la backdoor XZ?

Speck&Tech

ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune. Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile. BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).

Encryption in Microsoft 365 - ExpertsLive Netherlands 2024

Albert Hoitingh

Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf

Paige Cruz

Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack. While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack. I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:

GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...

Neo4j

Dr. Sean Tan, Head of Data Science, Changi Airport Group Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.

Monitoring Java Application Security with JDK Tools and JFR Events

Ana-Maria Mihalceanu

Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...

Zilliz

Recently uploaded (20)

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Microsoft - Power Platform_G.Aspiotis.pdf

“I’m still / I’m still / Chaining from the Block”

20 Comprehensive Checklist of Designing and Developing a Website

Full-RAG: A modern architecture for hyper-personalization

20240605 QFM017 Machine Intelligence Reading List May 2024

20240607 QFM018 Elixir Reading List May 2024

How to use Firebase Data Connect For Flutter

UiPath Test Automation using UiPath Test Suite series, part 5

Uni Systems Copilot event_05062024_C.Vlachos.pdf

Climate Impact of Software Testing at Nordic Testing Days

Presentation of the OECD Artificial Intelligence Review of Germany

Securing your Kubernetes cluster_ a step-by-step guide to success !

Cosa hanno in comune un mattoncino Lego e la backdoor XZ?

Encryption in Microsoft 365 - ExpertsLive Netherlands 2024

Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf

GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...

Monitoring Java Application Security with JDK Tools and JFR Events

Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...

Scaling Data overview

2. Confidential and Proprietary of Scaling Data All Rights Reserved 2 Scaling Data introduction What is “Big Data” Hadoop Capabilities and Uses Hadoop and its use in Analytics SSN overview and analytics direction Next steps

3. Confidential and Proprietary of Scaling Data All Rights Reserved 3 • Partnership comprised of seasoned Big Data, Hadoop, financial services, and security entrepreneurs • Focused on extracting value from ALL your data • Services include:  Data Discovery Assessments  Strategy Development  Hadoop Implementation  Hosted Hadoop Environment  Advanced Analytics Development

4. 4 FLEXIBILITYFLEXIBILITY Commoditization ofCommoditization of Distributed ComputingDistributed Computing SCALABILITYSCALABILITY Distributed Data ProcessingDistributed Data Processing Competitive AdvantageCompetitive Advantage SECURITYSECURITY Hardened ServersHardened Servers World-Class EncryptionWorld-Class Encryption Confidential and Proprietary of Scaling Data All Rights Reserved

5. 5 • Scaling Data focuses on Big Data problems in the financial services arena. • We provide data discovery, capture, analysis and strategies that allow organizations to better leverage ALL current and historical data beyond traditional relational and BI limitations • Hadoop Hosting Confidential and Proprietary of Scaling Data All Rights Reserved

6. 6 Scaling Data solutions focus on following Big Data industries : •Financial Services − Security/AML/Fraud − Payments Analysis •Retail − Spend Analysis − Pricing Optimization • Telecom and Utilities − Smart Grid Analysis − Pricing Optimization Confidential and Proprietary of Scaling Data All Rights Reserved

7. 7 Optimized Performance and Results Data Discovery Source Analysis Evolution of Using Big Data Hadoop Cluster Design, Implementation & Hosting Opportunity Identification Advanced Big Data Analytics Confidential and Proprietary of Scaling Data All Rights Reserved

8. 8Confidential and Proprietary of Scaling Data All Rights Reserved Relational Databases: • ACID system • Stores Tables (Schema) • Stores single digit terabytes • Processes GB’s per query • SQL • Interactive response • Low latency Hadoop: • A distributed operating system for data analysis • Stores Files (Structured and Unstructured) • Stores dozens of petabytes • Queries & Data Processing • Batch response (>30 sec) • HBase allows for low latency queries but you lose SQL Hadoop is good for storing and processing large amounts of unstructured or structured data in batch form HBase is the tool to use for petabyte size, low latency applications

9. 9 Companies that use Hadoop can expect the following: •70% are more confident in their ability to mange large data •88% can perform more analysis on large data •88% can keep more historical records •94% can analyze data in greater detail •82% can capture and use all source data Source: Ventana Research Confidential and Proprietary of Scaling Data All Rights Reserved

10. 10 data data data data data data data data data data Bring the function to the data Traditional RDBMS Traditional SAN Distributed Data Stores Confidential and Proprietary of Scaling Data All Rights Reserved

12. 12 • Efficiently execute sophisticated analytics Supports real-time transaction processing; handle thousands of transactions a second. Leverage the platform’s comprehensive range of analytic capabilities. • Leverage packaged capabilities and open analytics Balance the need for proven, off-the-shelf analytics with the capability to develop new rules / models with easy-to-use graphical tools. • Drive process efficiencies Automate and streamline investigations – with alert generation and comprehensive workflow and investigation management. • Adapt to changing organizational needs Adapt logic, processing and policies with user-friendly controls and tools, with and without IT support. New solutions can easily be deployed on the common platform to meet changing business needs. • Yield faster returns Proven, out-of-the-box analytics detect and prevent issues immediately. Speed implementation with flexible data mapping to legacy environments and a data source agnostic architecture. Confidential and Proprietary of Scaling Data All Rights Reserved

Editor's Notes

Difference between san storage and commodity disk A gigabyte of storage in Hadoop is .25 per month Where 1.00 a month in other database
Hadoop is not a replacement for Oracle and Mysql you offload task that they do not well
Gathers data from multiple sites Industry Customized Algorithms Flexible/Scalable platform Ability to see and highly unique trends Ability to store and analyze petabytes of data

Scaling Data overview

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Scaling Data overview

Similar to Scaling Data overview (20)

Recently uploaded

Recently uploaded (20)

Scaling Data overview

Editor's Notes