The document discusses strategies for organizations to better manage big data when resources are limited. It recommends identifying unused data in the data warehouse in order to reduce costs by moving that data to cheaper platforms like Hadoop. Organizations can save millions by offloading data that is not frequently queried but must be retained for regulatory reasons. The document also suggests purging data that is not needed at all to further reduce storage and management costs. Proper classification and placement of data onto platforms suited to its usage level and type, such as Hadoop for less critical datasets, can help organizations get more value from their data with fewer resources.
Big Data Trends and Challenges Report - WhitepaperVasu S
In this whitepaper read How companies address common big data trends & challenges to gain greater value from their data.
https://www.qubole.com/resources/report/big-data-trends-and-challenges-report
Is Your Company Braced Up for handling Big Datahimanshu13jun
Has your company recently launched new product or company is concerned with the poor sales figure or want to reach new prospects and also reduce the existing customers' attrition, then this thought evoking short hand guide is available for you to explore.
How to Implement a Spend Analytics Program Using Machine LearningTamrMarketing
Inaccurate data compromises your ability to negotiate your direct spend with suppliers. Costing your company millions of dollars in savings opportunities and potentially causing damaging delays in your time to market.
See the full presentation: https://resources.tamr.com/spend-analytics-webinar
3 Strategies to drive more data driven outcomes in financial servicesTamrMarketing
What are the main obstacles in the way of successful digital transformations within large financial organizations?
Read the blog and watch the full webinar here >> https://www.tamr.com/blog/webinar-3-strategies-to-drive-more-data-driven-outcomes-in-financial-services/
Big Data Trends and Challenges Report - WhitepaperVasu S
In this whitepaper read How companies address common big data trends & challenges to gain greater value from their data.
https://www.qubole.com/resources/report/big-data-trends-and-challenges-report
Is Your Company Braced Up for handling Big Datahimanshu13jun
Has your company recently launched new product or company is concerned with the poor sales figure or want to reach new prospects and also reduce the existing customers' attrition, then this thought evoking short hand guide is available for you to explore.
How to Implement a Spend Analytics Program Using Machine LearningTamrMarketing
Inaccurate data compromises your ability to negotiate your direct spend with suppliers. Costing your company millions of dollars in savings opportunities and potentially causing damaging delays in your time to market.
See the full presentation: https://resources.tamr.com/spend-analytics-webinar
3 Strategies to drive more data driven outcomes in financial servicesTamrMarketing
What are the main obstacles in the way of successful digital transformations within large financial organizations?
Read the blog and watch the full webinar here >> https://www.tamr.com/blog/webinar-3-strategies-to-drive-more-data-driven-outcomes-in-financial-services/
This talk is an introduction to Data Science. It explains Data Science from two perspectives - as a profession and as a descipline. While covering the benefits of Data Science for business, It explaints how to get started for embracing data science in business.
Operationalizing the Buzz: Big Data 2013VMware Tanzu
The 2013 EMA/9sight Big Data research makes a clear case for the maturation of Big Data as a critical approach for innovative companies. This year’s survey went beyond simple questions of strategy, adoption and use to explore why and how companies are utilizing Big Data. This year’s findings show an increased level of Big Data sophistication between 2012 and 2013 respondents. An improved understanding of the “domains of data” drives this increased sophistication and maturity. Highly developed use of
Process-mediated, Machine-generated and Human-sourced information is prevalent throughout this year’s study.
The presentation is a introduction to Big Data and analytics, how to go about enabling big data and analytics in our company, what are the main differences between big data analytics vs. traditional analytics and how to get started.
This material was used at the SAS Big Data Analytics event held in Helsinki on 19th of April 2011.
The slides are copyright of Accenture.
Architecting a Data Platform For Enterprise Use (Strata NY 2018)mark madsen
Building a data lake involves more than installing Hadoop or putting data into AWS. The goal in most organizations is to build multi-use data infrastructure that is not subject to past constraints. This tutorial covers design assumptions, design principles, and how to approach the architecture and planning for multi-use data infrastructure in IT.
Long:
The goal in most organizations is to build multi-use data infrastructure that is not subject to past constraints. This session will discuss hidden design assumptions, review design principles to apply when building multi-use data infrastructure, and provide a reference architecture to use as you work to unify your analytics infrastructure.
The focus in our market has been on acquiring technology, and that ignores the more important part: the larger IT landscape within which this technology lives and the data architecture that lies at its core. If one expects longevity from a platform then it should be a designed rather than accidental architecture.
Architecture is more than just software. It starts from use and includes the data, technology, methods of building and maintaining, and organization of people. What are the design principles that lead to good design and a functional data architecture? What are the assumptions that limit older approaches? How can one integrate with, migrate from or modernize an existing data environment? How will this affect an organization's data management practices? This tutorial will help you answer these questions.
Topics covered:
* A brief history of data infrastructure and past design assumptions
* Categories of data and data use in organizations
* Data architecture
* Functional architecture
* Technology planning assumptions and guidance
AI can give your organization the competitive advantage it needs, but the alarming truth is that only 1 in 10 data science projects ever make it into production. To be successful, organizations must not only correctly design and implement data science, but also raise the data, numerical, and technology literacy across the business.
Attend this webinar to learn what common pitfalls you need to avoid to keep your data science projects from failing. Then Data Scientist Gaby Lio will engage with the audience about project dos and don’ts and leave you with a checklist to ensure your projects success.
Analytics 3.0 Measurable business impact from analytics & big dataMicrosoft
Presentación del evento de Harvard Business Review sobre Analítica y Big Data
(15 de Octubre 2013)
"Featuring analytics expert Tom Davenport, author of Competing on Analytics, Analytics at Work, and the just-released Keeping Up with the Quants" 
My goal today is to inspire you to make a strong business case for applying big data in your enterprise, a key part of which is taking big data beyond analytics.
How to understand trends in the data & software marketmark madsen
The big challenge most analytics and IT professionals face today is dealing with complexity. Trends are still not clear. It helps to look at the past and current state to understand what’s really happening in the data technology market – a whole lot of reinvention and some innovation, but not where you expect it.
We have the (well-understood) problems that we have, with their (well-understood) limitations and intractabilities.
We deal with them in the world in which they were first codified and framed. Paradigms (world views) change as a function of political, economic, technological, cultural, use and growth, however, and when the world changes we’ll have a criteria for framing not just the problems/shortcomings/intractabilities of the prior paradigm, but that paradigm itself.
At that point, however, it will have ceased to matter because we’ll be dealing with fundamentally new problems/shortcomings/intractabilities.
The New Enterprise Blueprint featuring the Gartner Magic QuadrantLindaWatson19
Read how Solix Big Data Suite manages the entire data lifecycle without sacrificing governance, compliance, or performance. This newsletter can help you start the enterprise Hadoop journey without having to choose between operational efficiency and BI.
This talk is an introduction to Data Science. It explains Data Science from two perspectives - as a profession and as a descipline. While covering the benefits of Data Science for business, It explaints how to get started for embracing data science in business.
Operationalizing the Buzz: Big Data 2013VMware Tanzu
The 2013 EMA/9sight Big Data research makes a clear case for the maturation of Big Data as a critical approach for innovative companies. This year’s survey went beyond simple questions of strategy, adoption and use to explore why and how companies are utilizing Big Data. This year’s findings show an increased level of Big Data sophistication between 2012 and 2013 respondents. An improved understanding of the “domains of data” drives this increased sophistication and maturity. Highly developed use of
Process-mediated, Machine-generated and Human-sourced information is prevalent throughout this year’s study.
The presentation is a introduction to Big Data and analytics, how to go about enabling big data and analytics in our company, what are the main differences between big data analytics vs. traditional analytics and how to get started.
This material was used at the SAS Big Data Analytics event held in Helsinki on 19th of April 2011.
The slides are copyright of Accenture.
Architecting a Data Platform For Enterprise Use (Strata NY 2018)mark madsen
Building a data lake involves more than installing Hadoop or putting data into AWS. The goal in most organizations is to build multi-use data infrastructure that is not subject to past constraints. This tutorial covers design assumptions, design principles, and how to approach the architecture and planning for multi-use data infrastructure in IT.
Long:
The goal in most organizations is to build multi-use data infrastructure that is not subject to past constraints. This session will discuss hidden design assumptions, review design principles to apply when building multi-use data infrastructure, and provide a reference architecture to use as you work to unify your analytics infrastructure.
The focus in our market has been on acquiring technology, and that ignores the more important part: the larger IT landscape within which this technology lives and the data architecture that lies at its core. If one expects longevity from a platform then it should be a designed rather than accidental architecture.
Architecture is more than just software. It starts from use and includes the data, technology, methods of building and maintaining, and organization of people. What are the design principles that lead to good design and a functional data architecture? What are the assumptions that limit older approaches? How can one integrate with, migrate from or modernize an existing data environment? How will this affect an organization's data management practices? This tutorial will help you answer these questions.
Topics covered:
* A brief history of data infrastructure and past design assumptions
* Categories of data and data use in organizations
* Data architecture
* Functional architecture
* Technology planning assumptions and guidance
AI can give your organization the competitive advantage it needs, but the alarming truth is that only 1 in 10 data science projects ever make it into production. To be successful, organizations must not only correctly design and implement data science, but also raise the data, numerical, and technology literacy across the business.
Attend this webinar to learn what common pitfalls you need to avoid to keep your data science projects from failing. Then Data Scientist Gaby Lio will engage with the audience about project dos and don’ts and leave you with a checklist to ensure your projects success.
Analytics 3.0 Measurable business impact from analytics & big dataMicrosoft
Presentación del evento de Harvard Business Review sobre Analítica y Big Data
(15 de Octubre 2013)
"Featuring analytics expert Tom Davenport, author of Competing on Analytics, Analytics at Work, and the just-released Keeping Up with the Quants" 
My goal today is to inspire you to make a strong business case for applying big data in your enterprise, a key part of which is taking big data beyond analytics.
How to understand trends in the data & software marketmark madsen
The big challenge most analytics and IT professionals face today is dealing with complexity. Trends are still not clear. It helps to look at the past and current state to understand what’s really happening in the data technology market – a whole lot of reinvention and some innovation, but not where you expect it.
We have the (well-understood) problems that we have, with their (well-understood) limitations and intractabilities.
We deal with them in the world in which they were first codified and framed. Paradigms (world views) change as a function of political, economic, technological, cultural, use and growth, however, and when the world changes we’ll have a criteria for framing not just the problems/shortcomings/intractabilities of the prior paradigm, but that paradigm itself.
At that point, however, it will have ceased to matter because we’ll be dealing with fundamentally new problems/shortcomings/intractabilities.
The New Enterprise Blueprint featuring the Gartner Magic QuadrantLindaWatson19
Read how Solix Big Data Suite manages the entire data lifecycle without sacrificing governance, compliance, or performance. This newsletter can help you start the enterprise Hadoop journey without having to choose between operational efficiency and BI.
Intel Big Data Analysis Peer Research Slideshare 2013Intel IT Center
This PowerPoint presentation provides insights into results of a 2013 survey about big data analytics, including a comparison to 2012 big data survey results.
Big Data and Analytics: The New Underpinning for Supply Chain Success? - 17 F...Lora Cecere
Executive Overview
Today data is everywhere: but, nowhere. The world’s per capita capacity to store information has doubled every 40 months since the 1980s; and as of 2012, every day globally, 2.5 exabytes of data are created . As a result, social and customer data piles on the doorstep of the corporation, and operational data sits in the creases and cracks between functions. While many companies invested in data warehouse technologies and advanced applications for optimization, a common complaint in qualitative interviews with business leaders is “I cannot get to my data.” One business leader likened it to a Hotel California where, “The data checks into the system, but does not check out.” In most companies with heterogeneous information technology landscapes, simple reporting is still a major problem.
In the face of growing data, companies struggle with the basics. The question is, “Why pursue a big data and analytics strategy if the company cannot do basis reporting?” No doubt about it, the current state of analytics is a barrier to building supply chain excellence. It is hard to have a data-driven discussion if you can’t get access to data.
Now companies are in the middle of a renovation that forces them to be analytics-driven to
continue being competitive. Data analysis provides a complete insight about their business. It
also gives noteworthy advantages over their competitors. Analytics-driven insights compel
businesses to take action on service innovation, enhance client experience, detect irregularities in
process and provide extra time for product or service marketing. To work on analytics driven
activities, companies require to gather, analyse and store information from all possible sources.
Companies should bring appropriate tools and workflows in practice to analyse data rapidly and
unceasingly. They should obtain insight from data analysis result and make changes in their
business process and practice on the basis of gained result. It would help to be more agile than
their previous process and function.
Activating Big Data: The Key To Success with Machine Learning Advanced Analyt...Vasu S
A whitepaper of Qubole that How to make all of your data available to users for a multitude of use cases, ranging from analytics to machine learning and artificial intelligence.
https://www.qubole.com/resources/white-papers/activating-big-data-the-key-to-success-with-machine-learning-advanced-analytics
Enterprise Business Intelligence & Data Warehousing: The Data Quality ConundrumRTTS
RTTS recently performed a study of over 200 companies interested in improving the data quality of their Data Warehouse and Business Intelligence projects.
Our firm interviewed IT executives, data architects, ETL developers, data analysts and data warehouse testers to determine the state of the industry and current practices as they relate to data quality.
Highlights from the research:
- Oracle dominates the data warehouse industry with 42% of all installs.
- IBM is the leader in BI tools, with IBM Cognos owning 22% of the industry.
- Microsoft (22%) shockingly passed Informatica (19%) in the ETL tools section, but open source and home-grown tools (25%) are ahead of Microsoft.
- The majority of data warehouse installs (33%) are between 1 and 100 terabytes.
- 60% of companies surveyed test their data manually.
Read the results of the study, including details on the state of the industry and current practices as they relate to data quality.
This article takes a look at some of the reasons behind this data explosion, and some of the possible effects if the growth is not managed. We’ll also examine some of the ways in which these problems can be avoided.
Few decades ago, Managers relied on their instincts to take business decisions. They could afford to make mistakes and learn from it. Today, the scope for learning from mistakes is very minimal. Instincts should be backed by data to minimise mistakes.
Technological advancements, in addition to opening new channels of communication with customers, have also enabled organizations to collect vital information about their businesses with customers. But, have these organizations fully leveraged this data?
Today, Organizations make use of data for business decisions, but the data is not close enough to the customer to reap maximum benefit. In many cases, importance is not given to the granularity of data. The probability of “customer centric” decisions being right could be high, if the top management makes better use of the end user customer data (such as point of sale data, voice of customer, social media buzz etc.) to devise business strategies.
Similar to Big Data Management: Work Smarter Not Harder (20)
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
2. CONTENTS
Long on Data, Short on Resources 1
Know Your Data 3
ii Reducing Data Maintenance Costs 5
Choose Your Data Platform Wisely 8
ii Reigning in Data Growth Costs 10
Don’t Keep What You Don’t Need 11
ii Overcoming Data Growth and Regulatory Compliance Challenges 12
Getting What You Need to Manage Your Data 14
For More Info 15
3. 1 Big Data Management: Work Smarter Not Harder
We’ve been deluged with
statistics on data’s rapid growth
to the point that the numbers
and bytes have become almost
meaningless. No one would
deny that data growth is an
unstoppable trend. But that’s
not the issue. The real issue is
how organizations can make
big data meaningful when IT
resources are shrinking.
The good news is that business users want more data, and they’re getting
it, but in some cases, more data is actually having an adverse effect on
business. Fifty-six percent of IT decision makers surveyed by IDG said
that their users frequently or occasionally report feeling overwhelmed
by incoming data and information, while 53% said the influx of large
quantities of data has delayed important decisions because they didn’t
have the right tools to properly manage it. Leading companies are
realizing that having the right technology makes all the difference to
assure that data can be used as an asset rather than a liability.
LONG ON DATA, SHORT ON RESOURCES
56% 53%
of IT decision makers said that
their users report feeling
overwhelmed by incoming
data and information
said the influx of data has
delayed decisions
because they didn’t have
the right tools to manage it
IDG Enterprise, 2015
4. Data management staff
as a percentage of IT staff
has risen a meager
Computer Economics, 2015
0.5%
2 Big Data Management: Work Smarter Not Harder
Long on Data, Short on Resources
However, despite the perceived
value of data, the allocation
of resources to manage and
leverage big data has not kept
pace with its growth. According
to research firm Computer
Economics, data management
staff as a percentage of IT staff
has risen a meager .5% in four
years, and IT spending per user
continues to decline. In fact, the
same study showed that when
adjusted for inflation, spending
decreased from $10,514 in 2012
per user to just $6,847 in 2015.
But it’s not all about the money.
Finding people with the necessary skillsets will only grow more
challenging. The McKinsey Global Institute predicts that by 2018 the US
could face a shortage of 140,000-190,000 people with deep analytical
skills as well as a deficit of 1.5 million people who can leverage big data
analysis to make effective decisions. This drives the need for automation
that reduces the skills and training required to manage data.
As is often the case, the best way to address the big data resource and
skills shortage is to work smarter — not harder. In this ebook, we look
at how IT organizations can manage data smarter — while maintaining
or even reducing costs — so that business users can get real value from
data, faster and easier.
5. 3 Big Data Management: Work Smarter Not Harder
Moving data, transforming data,
and making it available to the
business is a very expensive
process. Given data’s rapid rate
of growth — and the amount
of waste in the current data
management paradigm — it’s
time to transform the economics
of data.
Most enterprises leverage a wide variety of data types in high volumes
for big data analytics projects. These include social media data, internal
data, log data, mobile device data, sensor data, free public external
data — and the list keeps growing. In fact, according to QuinStreet
Research, by 2020, the world will generate 50 times as much data as it
does today, but the IT staff responsible for managing it will only grow
1.5 times. On top of that challenge, only 40-55% of the data
that they load is ever used. When you consider that it
costs $2-6 million to support every 50-100 TB
of new data, supporting dormant data
results in a tremendous amount
of inefficiency.
KNOW YOUR DATA
QuinStreet Enterprise Research, 2014
But the IT staff
who manages it
will only grow 1.5Xthe world will generate 50Xas much data
By 2020
6. 4 Big Data Management: Work Smarter Not Harder
Dormant data also slows down performance since the process of loading
data uses up to 60% of the CPU. A lot of data may need to be retained
in its original form for compliance and undergo ETL and transformation
processes for the prospect of using it for other needs, but never get
used. As a result, it’s unnecessarily impacting costs and performance.
Know Your Data
But the exorbitant cost of not
managing dormant data well isn’t
just about the storage. In fact,
it’s less about the storage and
more about CPU capacity. Most
vendors charge by CPU capacity.
As CPU capacity increases, so do
your licensing costs.
Only
40to
55%
of the data
companies load will ever be used
Every 50-100 TB
of new data costs
$2-6 Million
to support it
Cost of Supporting DataData Waste Cost of CPU
Loading
data uses
up to
60%
(License costs go up as CPU capacity increases)
of the
CPU
Source: Based on Attunity customer implementations/input worldwide, 2015
7. CUSTOMER SUCCESS STORY
By offloading 43%
of the EDW into Hadoop
$21M
$5M
DECREASE
Source: Based on Attunity customer implementations/input worldwide, 2015
Yearly
maintenance
costs (in three years!)
5 Big Data Management: Work Smarter Not Harder
Reducing Data Maintenance Costs
By looking at and analyzing
EDW use for just one month, an
Attunity customer discovered
that 37 TB of data — 43% of the
EDW — didn’t receive any kind
of analytical query. And yet the
CPU consumption to ingest and
load the data was over 60%.
By offloading that 43% into Hadoop, the customer dramatically
decreased the need for more capacity, reduced the number of EDW
nodes and lowered maintenance costs. In fact, the customer is looking
at driving down yearly maintenance costs from $21 million to $5
million in just three years — all by being more strategic about data
management.
8. 6 Big Data Management: Work Smarter Not Harder
Know Your Data
The data warehouse is a
reflection of the business. It
grows in response to business
needs. It makes sense then to
analyze data activity and usage
accordingly. When you group
applications, data, or users in
the context of the business (for
example, by department or line
of business), you can then begin
to analyze utilization and assign
accountability via chargeback or
showback. For example, when
marketing requests more data
from IT, the IT department may
need to show them how much
data hasn’t been used, along
with the cost to continue to
manage current and new data.
When a business can specify how much it costs to load and maintain
data, and demonstrate how much isn’t being used, the dataset that
seemed so important before may lose some of its significance. The
standing request might just lose its urgency, particularly if the cost to
keep the data comes out of departmental budgets and ROI is lacking.
To figure out what’s used...
look
at what’s been qu
eried
Source: Based on Attunity customer implementations/input worldwide, 2015
43% of data in the
data warehouse never received a
single analytical query in a month
9. Source: Based on Attunity customer implementations/input worldwide, 2015
Identifying dormant data
recovers storage capacity
nt staff, as a percentage of IT staff, has risen a meager .5%
7 Big Data Management: Work Smarter Not Harder
are consuming CPU capacity. If you do need the data, say for regulatory
reasons, you can offload the processes of ETL to load and transform the
data onto a lower-cost Hadoop cluster. You not only recover storage
capacity, but you also consume less CPU capacity on the system because
of all the data that you’re not loading and ingesting into an EDW.
The key is to gain visibility into the EDW to learn what data is used and
what data is unused.
Identifying dormant data
recovers storage capacity. But it
also helps reduce costs related
to loading and transforming the
data. If you don’t need the data
anymore, you can stop loading
it, which means you eliminate a
portion of the ETL processes that
Know Your Data
10. 8 Big Data Management: Work Smarter Not Harder
As data grows, the platforms that support it increase
in size and multiply because different platforms
optimize different workloads. That’s why placing
data on the right platform is critical to efficiently
managing data as a strategic asset. Enterprises
can realize significant benefits by modernizing and
optimizing data placement.
Not all data is created equal. Some data is of high
value and used for complex analytics while other
data is kept primarily for regulatory purposes —
and then there’s all the data in between. A dataset
should be moved to the most appropriate platform
based on its use case.
CHOOSE YOUR DATA PLATFORM WISELY
Data that’s being loaded, but you
don’t need for the business
Datasets that are being utilized, but
don’t require a high-end data warehouse
Data that should be maintained,
but not used for analytics
Archive or throw away Load and maintain in Hadoop
Load and run batch
analytics in Hadoop
11. 9 Big Data Management: Work Smarter Not Harder
Choose Your Data Platform Wisely
There are three general types of data platforms:
Moving data that’s not queried
but still needs to be maintained
into a lower cost platform
like Hadoop can sometimes
help to support and balance
data growth. As a result, an
enterprise can reduce the need
for more storage capacity and
the number of EDW nodes. This
lowers both maintenance costs
and costs related to adding
more capacity.
The key is to figure out what
you’re loading into each of
these systems, and move
data as necessary to the
most appropriate
platform.
a particular subject area (such
as sales or finance). They may
be fed by data from a data
warehouse or from multiple
source systems. Data marts tend
to be hosted on typical, run-of-
the-mill servers.
ƒƒHadoop
Hadoop is suitable for structured,
unstructured, and semi-
structured data, and can run on
premises or in the cloud. Hadoop
is a great place to load and
maintain high volumes of data
that should be kept but is not
typically used for frequently used,
high-end analytics supporting
many simultaneous users.
Enterprise data warehouse
An enterprise data warehouse
(EDW) is appropriate for
frequently accessed, high-value
data used for complex analytics.
EDWs are high-end engineered
systems designed specifically
for complex analytics and many
simultaneous users — and
they’re priced accordingly. An
EDW is a great place to leverage
high-value data, but it isn’t the
ideal place to store data that you
don’t plan to use anytime soon.
‚‚Data mart
A data mart is more focused
than a data warehouse,
consolidating information for
12. CUSTOMER SUCCESS STORY
Online
Travel
Company Optimized data and workloads
for Hadoop cluster
Reduced
data footprint
on EDW by 30-40%
10Xin cost
savings
Reigning in Data Growth Costs
An online travel company’s 6+ petabyte production IT systems were
growing rapidly within a multi-platform environment that included
Hadoop and several legacy data warehouse systems. The DB2 data
warehouse was already at 300 TB, and adding more capacity was simply
cost prohibitive.
Using Attunity Visibility to balance workloads and data across the data
warehousing environment had a significant impact on costs associated
with data growth. The online travel company reduced its data footprint
on the EDW by 30-40%. Offloading data and associated workloads to
Hadoop saved the company $6 million.
Furthermore, its IT department
can ensure that these cost
savings are maintained by
providing chargeback reports
to business lines. By showing
business users what data is being
used and at what cost, IT can
make a case for moving data to
lower-cost platforms or making
additional investments in IT.
10 Big Data Management: Work Smarter Not Harder
13. 11 Big Data Management: Work Smarter Not Harder
Even as you move data to the
appropriate platform, it behooves you
to consider whether it’s necessary to
keep specific datasets at all. There’s
great potential to lower costs by
purging unused data. Many Attunity
customers report that more than one-
third of data in the data warehouse
never receives a single analytical query
in a month. That’s a huge chunk of data
— and potential cost savings.
In order to determine what data is
worth keeping, IT must analyze data
usage and collaborate across teams to
classify data into four categories:
Category 1: Data that doesn’t need to be kept at all and
can be purged. This data isn’t used for analysis, and it
doesn’t need to be archived.
Category 2: Data that must be kept for
regulatory or other reasons but isn’t being
used for analytical purposes. These datasets
do not require a high-end engineered EDW.
They can be placed in a Hadoop cluster or something less
cost prohibitive. Hadoop is a perfect option because it’s a less
expensive system that allows you to continue to do all the data
processing and maintenance and still have access to the data,
because it’s a live platform. So when you do need the data,
you can access it directly in Hadoop or move it into the data
warehouse for analysis on premises or in the cloud.
DON’T KEEP WHAT YOU DON’T NEED
14. CUSTOMER SUCCESS STORY
Large
Financial
Institution
Capped IT infrastructure investment at existing capacity
Avoided $15M
in upgrade costs
Ready to handle faster
rates of data growth
in the future
12 Big Data Management: Work Smarter Not Harder
Overcoming Data Growth and Regulatory Compliance Challenges
Data growth made it difficult for a leading national bank to manage data
and maintain regulatory compliance. With data growing at 100-150% a
year, the bank was quickly running out of capacity. It expected to spend
$10–15 million in 12–18 months on hardware upgrades. Meanwhile,
IT had no way of tracking who accessed what data at the table and
column level, which is necessary to fulfill regulatory compliance and
audit requests. Attunity Visibility enables the IT organization to make
informed decisions about the datasets and related workloads that can
be rebalanced and optimized with Hadoop. As a result, the institution
capped its IT infrastructure investment at existing capacity to avoid
$15 million in upgrade costs while also empowering its teams to handle
faster rates of data growth in
the future. Attunity Visibility also
helps the bank meet regulatory
compliance requirements and
respond to audit requests in
a timely manner. The solution
identifies user activity related
to specific customer data at a
granular level and generates
weekly audit reports.
15. In order to categorize data, you need
to understand what the datasets are
and what users are doing with them
13 Big Data Management: Work Smarter Not Harder
Category 3: Datasets that are analyzed but don’t require an engineered
EDW, such as large-scale data extracts for offline analytics. SAS is a
good example. Many SAS users access data that’s in a data warehouse,
but they don’t do the analytics in the data warehouse. Instead, they
extract huge amounts of data into the SAS server for data mining. This
use case doesn’t require an engineered system like an EDW. Hadoop
does a great job for batch analytics, and it costs less. You can pull huge
streams of data back to the SAS server and analyze it there.
Category 4: Data that’s widely and repeatedly leveraged by the
business, and therefore suitable for storage in your EDW.
In order to categorize data,
you need to understand what
the datasets are and what
users are doing with them.
You must then get buy-in from
the stakeholders. Show usage
patterns to the business and
collaborate with them to make
decisions in an iterative fashion.
Over time, the returns are
significant.
Don’t Keep What You Don’t Need
16. 14 Big Data Management: Work Smarter Not Harder
Effective data management
requires two primary capabilities:
Integrate and move data
more easily across all major
relational database systems,
enterprise data warehouses,
and cloud and big data
platforms.
‚‚ Tune performance, optimize
data placement, and reduce
costs with metrics on how
the business is utilizing data
and platform resources.
In addition to getting real value out of data, effective data management
enables IT organizations to reduce big data costs. With visibility into
how data is used, IT can work with the business to make informed
decisions about what data is worth keeping and how it should be
stored, and what data can be purged or archived. This practice has even
enabled some IT organizations to cap their IT infrastructure investments
at existing capacity.
Being called on to do more with
less is nothing new for IT. Time
and again, IT organizations
learn to work smarter and leaner
while delivering key services
to the business. Big data
analytics is no different.
GETTING WHAT YOU NEED TO MANAGE YOUR DATA
Pre
pareData
Move
D
ata
Analyze Usage
Effective
Data Management
Capabilities