Big Data and analytics workloads represent a new frontier for organizations. Data is being collected from sources that did not exist 10 years ago. Mobile phone data, machine-generated data, and website interaction data are all being collected and analyzed. In addition, as IT budgets are already being pressured down, Big Data footprints are getting larger and posing a huge storage challenge.
This paper provides information on the issues that Big Data applications pose for storage systems and how choosing the correct storage infrastructure can streamline and consolidate Big Data and analytics applications without breaking the bank.
InfiniBox bridges the gap between high performance and high capacity for Big Data applications. InfiniBox allows an organization implementing Big Data and Analytics projects to truly attain its business goals: cost reduction, continual and deep capacity scaling, and simple and effective management — and without any compromises in performance or reliability. All of this to effectively and efficiently support Big Data applications at a disruptive price point.
Learn more at www.infinidat.com.
Capitalizing on the New Era of In-memory ComputingInfosys
In-memory computing is all set to turn mainstream due to the host of benefits it offers and supportive factors, such as dropping memory prices, availability of more computing power and the growing need to leverage Big Data that requires new methods of processing unstructured information. Companies should use in-memory techniques while developing new analytics applications to take advantages of them and also consider re-engineering legacy systems to prepare them for new world of data, reduce complexity, improve scalability and speed.
Database Archiving - Managing Data for Long Retention PeriodsCraig Mullins
The retention of database records is vitally important because operational database systems are the primary storage mechanisms for sensitive business data used to populate documents of all kinds. Production reports, customer bills, patient invoices, and so on are examples of documents primarily populated by database data.
Several events in recent years have changed the requirements for retaining data from operational databases to long periods of time. Required retention periods have ballooned to many years, and in some cases, to many decades. As regulations increase and data volume rises, the importance of providing archived data on demand many years after it is created increases. As such, organizations must build a solid practice for archiving and managing business data from their online operational databases.
The presentation covers the basics of an archiving methodology and a number of topics that require special consideration in building a database archiving practice. Topics covered are application independence, metadata independence, data authenticity, change management, storage management, and access control.
History, definition, need, attributes, applications of data warehousing ; difference between data mining, big data, database and data warehouse ; future scope
This IT 812 business intelligence and data warehousing looks into the various factors including data warehousing, data mining and business intelligence as well the use and benefit of these for the modern day business organizations.
Data mining and data warehousing, database management system, Data mining and data warehousing, complete presentation of Data mining and data warehousing,
Capitalizing on the New Era of In-memory ComputingInfosys
In-memory computing is all set to turn mainstream due to the host of benefits it offers and supportive factors, such as dropping memory prices, availability of more computing power and the growing need to leverage Big Data that requires new methods of processing unstructured information. Companies should use in-memory techniques while developing new analytics applications to take advantages of them and also consider re-engineering legacy systems to prepare them for new world of data, reduce complexity, improve scalability and speed.
Database Archiving - Managing Data for Long Retention PeriodsCraig Mullins
The retention of database records is vitally important because operational database systems are the primary storage mechanisms for sensitive business data used to populate documents of all kinds. Production reports, customer bills, patient invoices, and so on are examples of documents primarily populated by database data.
Several events in recent years have changed the requirements for retaining data from operational databases to long periods of time. Required retention periods have ballooned to many years, and in some cases, to many decades. As regulations increase and data volume rises, the importance of providing archived data on demand many years after it is created increases. As such, organizations must build a solid practice for archiving and managing business data from their online operational databases.
The presentation covers the basics of an archiving methodology and a number of topics that require special consideration in building a database archiving practice. Topics covered are application independence, metadata independence, data authenticity, change management, storage management, and access control.
History, definition, need, attributes, applications of data warehousing ; difference between data mining, big data, database and data warehouse ; future scope
This IT 812 business intelligence and data warehousing looks into the various factors including data warehousing, data mining and business intelligence as well the use and benefit of these for the modern day business organizations.
Data mining and data warehousing, database management system, Data mining and data warehousing, complete presentation of Data mining and data warehousing,
Matt Aslett (The451Group) and Deirdre Mahon (RainStor) examine the evolving data management landscape and how RainStor's Online Data Retention (OLDR) repository fits into the equation.
Data warehousing Demo PPTS | Over View | Introduction Kernel Training
Module 1:
Introduction to Data Warehouse & Business Intelligence
Module 2: Data Warehouse Architecture
Module 3: Warehouse: D & F – Dimension & Fact Tables
Module 4: Data Modeling
Module 5: Building Data Warehouse with ER Win
Module 6: Introduction to Open Source ETL Tool – Talend DI Open Studio 5.x
Module 7: Building ETL Project with Talend DI Open Studio 5.x
Module 8: Introduction to Data Visualization BI Tool – Tableau 9.x
Module 9: Building Data Visualization BI Project With Tableau 9.x
Module 10: An Integrated Data Ware Housing & BI Project
Which Case-Studies will be a part of data warehousing and business intelligence online training?
Learn Data warehousing and business intelligence online training by real time expert. Be a part of live sessions. Data warehousing and business intelligence classes by Expert.
In the Age of Unstructured Data, Enterprise-Class Unified Storage Gives IT a ...Hitachi Vantara
Your enterprises can no longer be the realm of monolithic block-centric storage systems. Unstructured data drives the adoption of unified storage systems that support multiple protocols. Mobile smartphones and tablets accelerate the spread of file-based applications and data. Is your enterprise ready to join the mobile revolution? In this paper, see how Hitachi with our next generation of unified enterprise storage will help you move into the next era of business agility and efficiency.
Matt Aslett (The451Group) and Deirdre Mahon (RainStor) examine the evolving data management landscape and how RainStor's Online Data Retention (OLDR) repository fits into the equation.
Data warehousing Demo PPTS | Over View | Introduction Kernel Training
Module 1:
Introduction to Data Warehouse & Business Intelligence
Module 2: Data Warehouse Architecture
Module 3: Warehouse: D & F – Dimension & Fact Tables
Module 4: Data Modeling
Module 5: Building Data Warehouse with ER Win
Module 6: Introduction to Open Source ETL Tool – Talend DI Open Studio 5.x
Module 7: Building ETL Project with Talend DI Open Studio 5.x
Module 8: Introduction to Data Visualization BI Tool – Tableau 9.x
Module 9: Building Data Visualization BI Project With Tableau 9.x
Module 10: An Integrated Data Ware Housing & BI Project
Which Case-Studies will be a part of data warehousing and business intelligence online training?
Learn Data warehousing and business intelligence online training by real time expert. Be a part of live sessions. Data warehousing and business intelligence classes by Expert.
In the Age of Unstructured Data, Enterprise-Class Unified Storage Gives IT a ...Hitachi Vantara
Your enterprises can no longer be the realm of monolithic block-centric storage systems. Unstructured data drives the adoption of unified storage systems that support multiple protocols. Mobile smartphones and tablets accelerate the spread of file-based applications and data. Is your enterprise ready to join the mobile revolution? In this paper, see how Hitachi with our next generation of unified enterprise storage will help you move into the next era of business agility and efficiency.
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...DATAVERSITY
Thirty years is a long time for a technology foundation to be as active as relational databases. Are their replacements here?
In this webinar, we look at this foundational technology for modern Data Management and show how it evolved to meet the workloads of today, as well as when other platforms make sense for enterprise data.
Meeting the Priorities and Challenges of the Data Center
Data needs to be stored, managed and transmitted across a broad range of IT infrastructures. The biggest dilemma is how to deliver greater performance, reliability, and manageability at an affordable price.
Efficiently Managing the Growth of Data
Data centers need to collect larger volumes and varieties of data. For data centers with outdated infrastructures harnessing the power of data is extremely challenging. HGST HelioSeal® Platform is ideal for enterprise and data center applications where capacity density and power efficiency are paramount. HGST SSDs provide ultra-high performance in the mission critical 24/7/365 transaction processing environments. The HGST object storage platform allows easy access and retrieval of deep-archived data. HGST solutions meet the needs of cloud service providers delivering scalability, capacity and performance.
Gain New Insights by Analyzing Machine Logs using Machine Data Analytics and BigInsights.
Half of Fortune 500 companies experience more than 80 hours of system down time annually. Spread evenly over a year, that amounts to approximately 13 minutes every day. As a consumer, the thought of online bank operations being inaccessible so frequently is disturbing. As a business owner, when systems go down, all processes come to a stop. Work in progress is destroyed and failure to meet SLA’s and contractual obligations can result in expensive fees, adverse publicity, and loss of current and potential future customers. Ultimately the inability to provide a reliable and stable system results in loss of $$$’s. While the failure of these systems is inevitable, the ability to timely predict failures and intercept them before they occur is now a requirement.
A possible solution to the problem can be found is in the huge volumes of diagnostic big data generated at hardware, firmware, middleware, application, storage and management layers indicating failures or errors. Machine analysis and understanding of this data is becoming an important part of debugging, performance analysis, root cause analysis and business analysis. In addition to preventing outages, machine data analysis can also provide insights for fraud detection, customer retention and other important use cases.
Architecting a Modern Data Warehouse: Enterprise Must-HavesYellowbrick Data
The goal of modern data warehousing is to not only deliver insights faster to more users, but provide a richer picture of your operations afforded by a greater volume and variety of data for analysis.
This presentation from a Database Trends and Applications webcast will educate IT decision makers and data warehousing professionals about the must-have capabilities for modern data warehousing today – how they work and how best to use them.
Data warehousing has quickly evolved into a unique and popular busin.pdfapleather
Data warehousing has quickly evolved into a unique and popular business application class.
Early builders of data warehouses already consider their systems to be key components of their
IT strategy and architecture. Numerous examples can be cited of highly successful data
warehouses developed and deployed for businesses of all sizes and all types. Hardware and
software vendors have quickly developed products and services that specifically target the data
warehousing market. This paper will introduce key concepts surrounding the data warehousing
systems.
What is a data warehouse? A simple answer could be that a data warehouse is managed data
situated after and outside the operational systems. A complete definition requires discussion of
many key attributes of a data warehouse system. Later in Section 2, we will identify these key
attributes and discuss the definition they provide for a data warehouse. Section 3 briefly reviews
the activity against a data warehouse system. Initially in Section 1, however, we will take a brief
tour of the traditions of managing data after it passes through the operational systems and the
types of analysis generated from this historical data.
Evolution of an application class
This section reviews the historical management of the analysis data and the factors that have led
to the evolution of the data warehousing application class.
Traditional approaches to historical data
In reviewing the development of data warehousing, we need to begin with a review of what had
been done with the data before of evolution of data warehouses. Let us first look at how the kind
of data that ends up in today\'s data warehouses had been managed historically.
Throughout the history of systems development, the primary emphasis had been given to the
operational systems and the data they process. It is not practical to keep data in the operational
systems indefinitely; and only as an afterthought was a structure designed for archiving the data
that the operational system has processed. The fundamental requirements of the operational and
analysis systems are different: the operational systems need performance, whereas the analysis
systems need flexibility and broad scope. It has rarely been acceptable to have business analysis
interfere with and degrade performance of the operational systems.
Data from legacy systems
In the 1970s virtually all business system development was done on the IBM mainframe
computers using tools such as Cobol, CICS, IMS, DB2, etc. The 1980s brought in the new mini-
computer platforms such as AS/400 and VAX/VMS. The late eighties and early nineties made
UNIX a popular server platform with the introduction of client/server architecture.
Despite all the changes in the platforms, architectures, tools, and technologies, a remarkably
large number of business applications continue to run in the mainframe environment of the
1970s. By some estimates, more than 70 percent of business data for large corporations still
resi.
Learn about Positioning IBM Flex System 16 Gb Fibre Channel Fabric for Storage-Intensive Enterprise Workloads. This IBM Redpaper discusses server performance imbalance that can be found in typical application environments and how to address this issue with the 16 Gb Fibre Channel technology to provide required levels of performance and availability for the storage-intensive applications. For more information on Pure Systems, visit http://ibm.co/18vDnp6.
Visit the official Scribd Channel of IBM India Smarter Computing at http://bit.ly/VwO86R to get access to more documents.
Stream Meets Batch for Smarter Analytics- Impetus White PaperImpetus Technologies
For Impetus’ White Papers archive, visit- http://www.impetus.com/whitepaper
The paper discusses how the traditional batch and real time paradigm can work together to deliver smarter, quicker and better insights on large volumes of data picking the right strategy and right technology.
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™UiPathCommunity
In questo evento online gratuito, organizzato dalla Community Italiana di UiPath, potrai esplorare le nuove funzionalità di Autopilot, il tool che integra l'Intelligenza Artificiale nei processi di sviluppo e utilizzo delle Automazioni.
📕 Vedremo insieme alcuni esempi dell'utilizzo di Autopilot in diversi tool della Suite UiPath:
Autopilot per Studio Web
Autopilot per Studio
Autopilot per Apps
Clipboard AI
GenAI applicata alla Document Understanding
👨🏫👨💻 Speakers:
Stefano Negro, UiPath MVPx3, RPA Tech Lead @ BSP Consultant
Flavio Martinelli, UiPath MVP 2023, Technical Account Manager @UiPath
Andrei Tasca, RPA Solutions Team Lead @NTT Data
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
The Metaverse and AI: how can decision-makers harness the Metaverse for their...Jen Stirrup
The Metaverse is popularized in science fiction, and now it is becoming closer to being a part of our daily lives through the use of social media and shopping companies. How can businesses survive in a world where Artificial Intelligence is becoming the present as well as the future of technology, and how does the Metaverse fit into business strategy when futurist ideas are developing into reality at accelerated rates? How do we do this when our data isn't up to scratch? How can we move towards success with our data so we are set up for the Metaverse when it arrives?
How can you help your company evolve, adapt, and succeed using Artificial Intelligence and the Metaverse to stay ahead of the competition? What are the potential issues, complications, and benefits that these technologies could bring to us and our organizations? In this session, Jen Stirrup will explain how to start thinking about these technologies as an organisation.
2. 2
Big Data and analytics workloads represent a new frontier for organizations. Data is being
collected from sources that did not exist 10 years ago. Mobile phone data, machine-
generated data, and web site interaction data are all being collected and analyzed. In
addition, as IT budgets are already being pressured down, Big Data footprints are getting
larger and posing a huge storage challenge.
This paper provides information on the issues that Big Data applications pose for storage
systems and how choosing the correct storage infrastructure can streamline and consolidate
Big Data and analytics applications without breaking the bank.
Abstract
3. 3
Introduction: Big Data Stresses
Storage Infrastructure
Big Data applications have caused an explosion in the amount of data that an
organization needs to keep online and analyze. This has caused the cost of storage as a
percent of the overall IT budget to explode. A study recently performed among Big Data and
analytics-driven organizations discovered these top 5 use cases for this data explosion:
1. Enhanced customer service and support
2. Digital security, intrusion detection, fraud detection and prevention
3. Operational analysis
4. Big Data exploration
5. Data warehouse augmentation
Enhanced customer service is always on the minds of organizations, both large and small.
“How can I help improve the relationship with my customer? I know my customer will choose
to buy from me, and buy more often, if I cultivate and enhance the relationship.” Typically
this is preference data gathered from various sources, online navigation, and search criteria.
Collected through the lifecycle of the customer, this data is mined and written alongside
customer records.
Digital security and intrusion detection is very important to customers. This data is
collected and analyzed in real time, and is typically machine generated. The analytic results
must be returned immediately for this activity to be relevant. This requires fast storage with
lots of capacity, as machine-generated and sensor data can consume large capacities.
Operational analysis involves collecting data, many times sensor-based from other
machines, and using that data to identify areas of operational improvement and conduct
fault isolation and break-fix analysis. Manufacturing firms collect up to the second data about
robotic activities on their shop floor and want to know not just the status, but how they can
improve the process. Like intrusion detection, this data is generated and analyzed in real
time, and the results must be stored and sent back up the chain to be actionable. Unlike
intrusion detection, all data is interesting and shows machine and process trends, which
can be of use later.
Big Data exploration: How do you know what Big Data is until you find out what you are
collecting, what you are not, and identify what is missing. Normally this is done by collecting
more and more data.
4. 4
Data Warehouse augmentation: “How do I take my existing analytics data, typically in a
Warehouse or Mart form, and augment the data feeds from outside sources to improve
accuracy, reduce execution time, and give me the answers I need without reinventing the
wheel?” Data warehouse adoption is becoming widespread, even for smaller organizations,
as transactional data analysis is becoming a requirement for any organization at any level.
All of these use cases require more storage and more compute power. Big Data is now
considered production data, so availability, recoverability and performance are just as
important as for the transactional systems within the organization. And as stated before,
the trend is for IT budgets to get smaller, not bigger. These diametrically opposed forces
are creating a change in the storage industry. How can you do more with less while taking
into account that you also don’t want to compromise on system and application reliability,
efficiency or performance.
The demands that Big Data and analytic workloads place on enterprise storage can be
summed up as follows:
• Must have excellent performance
• Must have extremely high density
• Must have excellent uptime, high reliability
• Must be easy to use, easy to manage, easy to provision
• Must have attractively low total cost of ownership
This is where a brand-new storage architecture such as INFINIDAT’s InfiniBox™ can help.
High Performance
Large data sets, heavyweight analytics applications, and time-sensitive demands for results
make high performance of the underlying storage a key criteria. In InfiniBox, maximum
performance is achieved with no tuning or optimization. InfiniBox uses standard “off
the shelf” hardware components (CPU/Memory/HDDs/SSDs) wrapped in sophisticated
storage to extract the maximum performance from the 480 Near-Line SAS drives used in
the InfiniBox architecture. One of the key elements developed in the core system code
is the ability to analyze real application profiles and surgically define cache pre-fetch and
de-stage algorithms. The system design specifically targets real-life profiles and provides
optimum performance under those conditions. This capability is at the core of the InfiniBox
architecture.
5. 5
Large Data Sets
Large data sets pose a unique and daunting challenge to enterprise storage arrays by
providing an I/O profile that is unpredictable, and often overwhelms the storage frame. This
results in high latencies, which increase the run time of analytic workloads. Some analytic
activities are very latency sensitive, and in many cases will affect the end-user population that
the application supports. Many of these workloads will overwhelm storage platforms with
limited cache sizes. InfiniBox contains up to 2.3TB of high-speed L1 DIMM cache and 23TB
of L2 SSD cache used to improve cache hits and reduce latency.
• Consistent Performance
• High Capacity
• Utmost Uptime and Reliability
• Easy to Manage and Provision
• Attractive Cost of Ownership
Exploration
Enhanced Customer Service
Security and Fraud Protection
Operational Analysis
DWH Augmentation
Storage Requirements for
Big Data and Analytics
Big Data and Analytics
Use Cases
6. 6
Many of these characteristics can be seen occurring simultaneously, while others are driven
by specific activities such as backups or data load/ETL. InfiniBox thrives on supporting a wide
range of I/O types, all at the same time. The data architecture virtualizes the storage for each
volume by populating each of the 480 spindles in the InfiniBox frame, all in parallel, using a
sophisticated data parity dispersed layout.
In addition, InfiniBox utilizes very advanced capabilities to improve writes. Using a unique and
patented multi-modal log write mechanism, INFINIDAT significantly improves the effciency
of write I/O de-staged from cache. This is very important for the Data Acquisition and ETL
phases of this example.
Block Size Management Matters
Many analytic workloads can change the I/O profile on the fly. But in general, the vast
majority of Big Data and Analytic applications use large block I/O, loading data in from
storage, reducing, sorting, comparing, then writing out aggregate data. Large blocks can
historically give traditional storage platforms problems because most storage environments
are not designed with large-block support in mind.
sequential read/sequential write
sequential read/sequential write
random read/sequential read
random read/sequential write
sequential write/random read
Profiling/Hygiene (cleanup)
Loading
Query/Reporting
Building interim data mart
Analytic Modeling
sequential write
sequential read/sequential write
sequential read/sequential write
random read/sequential read
random read/sequential write
sequential read/random read
Data Acquisition
Profiling/Hygiene (cleanup)
Loading
Query/Reporting
Building Interim Data Mart
Analytic Modeling
Diverse I/O Profiles and Patterns Used in Big Data
We have seen many analytic environments exhibit I/O profiles containing all of the following
characteristics:
7. 7
High Density
INFINIDAT has the ability to configure a system with over 2.7PB of usable storage in a
single 19-inch rack. The InfiniBox storage system is a modern, fully symmetric grid, all-
active controller (node) system with an advanced multi-layer caching architecture. The data
architecture encompasses a double parity (wide-stripe) data distribution model. This model
uses a unique combination of random data distribution and parity protection. This ensures
maximum data availability while minimizing data footprint. Each and every volume created on
a single InfiniBox frame will store small pieces of data on each of the 480 drives in the frame.
InfiniBox usable storage per frame is the highest in the storage industry.
High Availability and Reliability
Keeping analytics available is critical for every storage system. The InfiniBox architecture
provides a robust, highly available storage environment, providing 99.99999% uptime. That
equates to less than 3 seconds of downtime a year! Our customers report no loss of data,
even upon multiple disk failures. InfiniBox offers end-to-end business continuity features,
including asynchronous remote mirroring and snapshots. Using snapshots, recovery of
a database can be reduced to the amount of time it takes to map the volumes to hosts,
minutes instead of hours using a more traditional backup and recovery process.
Easy to Use, Automated Provisioning
and Management
The InfiniBox architecture, along with the elegant simplicity of its web-based GUI and built-in
Command Line Interface, allows for easy, fast deployment and management of the storage
system. The amount of time saved in performing traditional storage administration tasks is
huge. Also, because of the InfiniBox open architecture and aggressive support for RESTful
API, platforms such as OpenStack, and Docker allow storage administration tasks to be
performed at the application level, without the need to use the InfiniBox GUI.
InfiniBox provides a management system that can isolate storage pools and volumes to
specific users. Multi-tenancy features are supported so that application users, such as those
deployed in a private cloud environment, can see and manage the storage that has been
granted to that user community.