This document discusses harnessing big data tools in financial services. It describes three types of big data problems: those with high data volume and low algorithm complexity, those with any data volume but high algorithm complexity, and those inherently involving big data. It also covers implications for data security, governance, and data center infrastructure, concluding that big data tools require maturing security controls while data centers will utilize many energy efficient servers with local storage.
Performance Management in ‘Big Data’ ApplicationsMichael Kopp
Do applications using NoSQL still require performance management? Is it always the best option to throw more hardware at a MapReduce job? In both cases, performance management is still about the application, but "Big Data" technologies have added a new wrinkle.
4 Ways To Save Big Money in Your Data Center and Private Cloudtervela
The thirst for real-time access to rich content and big data is turning enterprise datacenters into private computing clouds. However, making exabyte-scale data available and responsive to a global application network gets expensive. Fortunately there are things you can do to save big money in these sophisticated new environments. In this presentation you will learn how to save money, avoid costs, and create significant efficiencies in your private cloud by: Consolidating databases and data warehouses, Slashing big data storage and storage-based data replication , Replacing expensive middleware, and Eliminating cold disaster recovery
This 30-minute webcast is for IT Architects, Engineers, and CIOs who build and manage globally distributed applications. Learn about the evolving landscape of big data as it relates to mission-critical systems like trading platforms, intelligence networks, logistics, and more. We also cover existing solutions for big data movement including data fabrics.
How Real TIme Data Changes the Data Warehousemark madsen
Surveys show a growing demand for more up-to-date data in our BI environments. To meet these needs requires changing from a strict reliance on nightly batch-style ETL to other methods. What is often ignored is how this affects the data warehouse. This shift introduces new technology and methods, which means the warehouse must support new types of workloads.
• Methods and tools for processing up-to-date data
• New requirements for your data warehouse database or platform
• What to look for as you address these requirements
Big Data: Movement, Warehousing, & Virtualizationtervela
This presentation was given by Barry Thompson, CTO of Tervela, to TSAM (a financial buy-side technology & operations event) in July 2011. It covers trends in big data and how to solve problems with data movement, warehousing, and virtualization solutions.
Building a business intelligence architecture fit for the 21st century by Jon...Mark Tapley
Objectives of the presentation:
To record some history –what has happened in the past that makes the future quite challenging.
To provide real examples of BI at work –good and bad.
To illustrate the nature of data and why it has become so important in driving forward
the business in the 21stcentury.
To outline a way to align technology with the business so that efforts and budget are spent
in a way that will enable the future rather that support the past.
To propose a set of principles and ideas that can guide a company in a way to make data available to all who have the penchant to turn it into useful and valuable information.
To describe the new organisation unit that will be needed to realise the dream.
Performance Management in ‘Big Data’ ApplicationsMichael Kopp
Do applications using NoSQL still require performance management? Is it always the best option to throw more hardware at a MapReduce job? In both cases, performance management is still about the application, but "Big Data" technologies have added a new wrinkle.
4 Ways To Save Big Money in Your Data Center and Private Cloudtervela
The thirst for real-time access to rich content and big data is turning enterprise datacenters into private computing clouds. However, making exabyte-scale data available and responsive to a global application network gets expensive. Fortunately there are things you can do to save big money in these sophisticated new environments. In this presentation you will learn how to save money, avoid costs, and create significant efficiencies in your private cloud by: Consolidating databases and data warehouses, Slashing big data storage and storage-based data replication , Replacing expensive middleware, and Eliminating cold disaster recovery
This 30-minute webcast is for IT Architects, Engineers, and CIOs who build and manage globally distributed applications. Learn about the evolving landscape of big data as it relates to mission-critical systems like trading platforms, intelligence networks, logistics, and more. We also cover existing solutions for big data movement including data fabrics.
How Real TIme Data Changes the Data Warehousemark madsen
Surveys show a growing demand for more up-to-date data in our BI environments. To meet these needs requires changing from a strict reliance on nightly batch-style ETL to other methods. What is often ignored is how this affects the data warehouse. This shift introduces new technology and methods, which means the warehouse must support new types of workloads.
• Methods and tools for processing up-to-date data
• New requirements for your data warehouse database or platform
• What to look for as you address these requirements
Big Data: Movement, Warehousing, & Virtualizationtervela
This presentation was given by Barry Thompson, CTO of Tervela, to TSAM (a financial buy-side technology & operations event) in July 2011. It covers trends in big data and how to solve problems with data movement, warehousing, and virtualization solutions.
Building a business intelligence architecture fit for the 21st century by Jon...Mark Tapley
Objectives of the presentation:
To record some history –what has happened in the past that makes the future quite challenging.
To provide real examples of BI at work –good and bad.
To illustrate the nature of data and why it has become so important in driving forward
the business in the 21stcentury.
To outline a way to align technology with the business so that efforts and budget are spent
in a way that will enable the future rather that support the past.
To propose a set of principles and ideas that can guide a company in a way to make data available to all who have the penchant to turn it into useful and valuable information.
To describe the new organisation unit that will be needed to realise the dream.
LNETM - Atsign - Privacy with Personal Data ServicesChris Swan
London Enterprise Technology Meetup (LNETM) presentation on Atsign's atPlatform, which uses personal data services (PDS) and end-end encryption to build privacy preserving applications for everybody, every organisation and everyTHING.
SOOCon24 - Showing that you care about security - OpenSSF ScorecardsChris Swan
Open Source Security Foundation (OpenSSF) Scorecards provide a way for open source users to determine whether maintainers are being diligent about securing their link in the software security supply chain. Practices such as pinning dependencies, branch protection, required reviews, continuous integration tests etc. are measured to provide a score and accompanying badge.
This presentation will provide a walkthrough of the steps involved in securing a first repository, and then what it takes to repeat that process across and organization with multiple repos. It will also look at the ongoing maintenance involved once scorecards have been implemented, and how aspects of that maintenance can be better automated to minimize toil.
All Day DevOps 2023 - Implementing OSSF Scorecards Across an Organisation.pdfChris Swan
Open Source Security Foundation (OpenSSF) Scorecards provide a way for open source users to determine whether maintainers are being diligent about securing their link in the software security supply chain. Practices such as pinning dependencies, branch protection, required reviews, continuous integration tests etc. are measured to provide a score and accompanying badge.
This presentation will provide a walkthrough of the steps involved in securing a first repository, and then what it takes to repeat that process across and organization with multiple repos. It will also look at the ongoing maintenance involved once scorecards have been implemented, and how aspects of that maintenance can be better automated to minimize toil.
Fluttercon Berlin 23 - Dart & Flutter on RISC-VChris Swan
Arm has dominated the mobile space since the dawn of smartphones, but systems based on the open source RISC-V instruction set architecture will bring new choices for manufacturers and us, their customers. RISC-V SDKs showed up in the Dart dev channel in Apr 22, but it's still pretty hard to build stuff due to lots of missing dependencies. As always happens with new stuff, the hardware people are waiting for broader software support, and the software people are waiting for a larger hardware installed base. This talk examines the forces that are driving RISC-V forward, and what developers can expect from a world that will have RISC-V devices, mobile phones, tablets and cloud services.
QConNY 2023 - Implementing OSSF Scorecards Across an OrganisationChris Swan
Open Source Security Foundation (OpenSSF) Scorecards provide a way for open source users to determine whether maintainers are being diligent about securing their link in the software security supply chain. Practices such as pinning dependencies, branch protection, required reviews, continuous integration tests etc. are measured to provide a score and accompanying badge.
This presentation will provide a walkthrough of the steps involved in securing a first repository, and then what it takes to repeat that process across and organization with multiple repos. It will also look at the ongoing maintenance involved once scorecards have been implemented, and how aspects of that maintenance can be better automated to minimize toil.
Flutter SV Meetup Oct 2022 - End to end encrypted IoT with Dart and FlutterChris Swan
Walkthrough of how Internet of Things (IoT) devices can run full stack Dart and connect to Flutter apps using end to end encryption to provide security and privacy.
Dart's popularity has surged in the past few years, as it's the language behind Flutter - Google's cross platform front end framework. That's now driving a notion of 'Full Stack Dart', where if you've spent time learning Dart for the front end, why not also use it for the back end.
London IoT Meetup Sep 2022 - End to end encrypted IoTChris Swan
Your thing, your data.
An overview of why end-end encryption is desirable for the Internet of Things (IoT), and how it can be done using personal data stores such as atSigns on the atPlatform.
Flutter Vikings 2022 - End to end IoT with Dart and FlutterChris Swan
Things need apps to manage them, which Flutter is great for, providing an easy way to build cross platform support. But things also need to get their data (securely and privately) to their apps, and Dart can be used for that. This presentation will walk through a use case demonstrated at Mobile World Congress (and now open sourced) that uses Dart to read sensor data through to Flutter for user presentation.
EMFcamp2022 - What if apps logged into you, instead of you logging into apps?Chris Swan
As a hacker and engineer I've been interested in identity and privacy since the dawn of the Internet and the online services it's enabled. For the past year I've been helping to build and open source The @ Platform, which inverts the usual model by giving everybody (and every thing) their own place to store data and control who (and what) has access to it. This talk will give an overview of the platform and its underlying protocol, and illustrate how it can be used to build privacy preserving apps and Internet connected things. It will also cover how the platform can be self hosted on devices like the Raspberry Pi, and how people can get involved in the open source community growing around it.
Devoxx UK 2022 - Application security: What should the attack landscape look ...Chris Swan
What do we need to do in the next few years to ensure that the attack landscape for 2030 isn't the same as 2020? Better languages and frameworks have already brought substantial improvements in memory safety, eliminating whole classes of vulnerabilities caused by buffer overflows.Yet despite a major reshuffle in 2021, the OWASP top 10 remains full of things that boil down to a lack of input validation. An issue that has bedevilled tech since its inception. We're all told that we shouldn't trust the input to our programs, and that validation is our best defence. But developers get precious little help on that front from today's languages and frameworks; something that can and should change. This talk will examine a hypothetical evolution of TypeScript - ValidScript, to consider a future where input validation is baked in.
Flutter Festival London 2022 - End to end IoT with Dart and FlutterChris Swan
A walk through of a demo system that was built for Mobile World Congress 2022 showing how Dart can be used to read data from a biometric sensor and send it to a Flutter front end application using end to end encryption.
Full Stack Squared 2022 - Power of Open SourceChris Swan
An examination of open source freedoms (free like beer
free like speech, and free like puppy), the people behind open source and how anybody can get involved.
Flutter provides an excellent way to build Android, iOS, web and desktop apps, but what about the back end services? Full stack Dart is all about using that investment in Dart programming to build the services used by applications, whether it's in the cloud or on the Internet of Things. This presentation will look at the tradeoffs between just in time (JIT) and ahead of time (AOT) compilation, Dart on Docker, the Functions Framework for Dart, Profiling and Performance Management. Choices of back end architecture (x86_64 vs Arm) will also be examined, along with some of the challenges this can present for Continuous Delivery.
Why Dart?
Language features
JIT vs AOT
Dart on Docker
Functions Framework for Dart
Profiling and performance management
Other places you can learn more
Call to action - try out the Functions Framework Examples
Dart on Arm - Flutter Bangalore June 2021Chris Swan
Running Dart on Arm servers, covering the trade offs between JIT and AOT. The dependencies needed for building and running AOT binaries, and how to cross compile Arm binaries.
The RC2014 system is built around a Z80 CPU, but is open and flexible enough to be used with alternatives. The presentation walks through a project to use Texas Instruments' TMS99xx parts, through to running 'Hello World' in BASIC and Forth.
DevSecOps Days London - Teaching 'Shift Left on Security'Chris Swan
Deck with backup screenshots of live demo of DevOps Dojo Yellow belt module 'Shift Left on Security' where students incorporate the OWASP dependency checking into a Jenkins CD pipeline around the Springboot Pet Clinic app.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Welcome to the first live UiPath Community Day Dubai! Join us for this unique occasion to meet our local and global UiPath Community and leaders. You will get a full view of the MEA region's automation landscape and the AI Powered automation technology capabilities of UiPath. Also, hosted by our local partners Marc Ellis, you will enjoy a half-day packed with industry insights and automation peers networking.
📕 Curious on our agenda? Wait no more!
10:00 Welcome note - UiPath Community in Dubai
Lovely Sinha, UiPath Community Chapter Leader, UiPath MVPx3, Hyper-automation Consultant, First Abu Dhabi Bank
10:20 A UiPath cross-region MEA overview
Ashraf El Zarka, VP and Managing Director MEA, UiPath
10:35: Customer Success Journey
Deepthi Deepak, Head of Intelligent Automation CoE, First Abu Dhabi Bank
11:15 The UiPath approach to GenAI with our three principles: improve accuracy, supercharge productivity, and automate more
Boris Krumrey, Global VP, Automation Innovation, UiPath
12:15 To discover how Marc Ellis leverages tech-driven solutions in recruitment and managed services.
Brendan Lingam, Director of Sales and Business Development, Marc Ellis
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
The Metaverse and AI: how can decision-makers harness the Metaverse for their...Jen Stirrup
The Metaverse is popularized in science fiction, and now it is becoming closer to being a part of our daily lives through the use of social media and shopping companies. How can businesses survive in a world where Artificial Intelligence is becoming the present as well as the future of technology, and how does the Metaverse fit into business strategy when futurist ideas are developing into reality at accelerated rates? How do we do this when our data isn't up to scratch? How can we move towards success with our data so we are set up for the Metaverse when it arrives?
How can you help your company evolve, adapt, and succeed using Artificial Intelligence and the Metaverse to stay ahead of the competition? What are the potential issues, complications, and benefits that these technologies could bring to us and our organizations? In this session, Jen Stirrup will explain how to start thinking about these technologies as an organisation.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
3. Overview
Based on a blog post from April 2012 – http://is.gd/swbdla
Problem Types
Big Data
Data Volume
Quant
Simple
Algorithm Complexity
3
4. Simple problems
Low data volume, low algorithm complexity
Problem Types
Big Data
Data Volume
Quant
Simple
Algorithm Complexity
4
5. Quant Problems
Any data volume, high algorithm complexity
Problem Types
Big Data
Data Volume
Quant
Simple
Algorithm Complexity
5
6. Big Data Problems
High data volume, low algorithm complexity
Problem Types
Types of Big Data Problem:
Big Data 1. Inherent
Data Volume
2. More data gives better
Quant result than more complex
algorithm
Simple
Algorithm Complexity
6
7. The good, the bad and the ugly of Big Data
Good
– Lots of new tools, mostly open source
Bad
– Term being abused by marketing departments
Ugly
– Can easily lead to over reliance on systems that lack transparency and ignore specific data
points
'Computer says no', but nobody can explain why
7
8. Misquoting Roger Needham
Whoever thinks their analytics problem is
solved by big data,
doesn’t understand their analytics
problem and doesn’t understand
big data
8
10. The priesthood of storage and the cult of the DBA
Enterprise storage systems have (mostly) their own interconnect and their own special
people to look after that, any changes (weekends only) and backups
– The priesthood of storage
Relational Database Management Systems (RDBMS) are about more than just SQL
– Backup and recovery
– Access control
– Identity management
– Integration with enterprise directories
– Data security
– Encryption
– Schema management
– Glossaries and data dictionaries
DataBase Administrators (DBAs) have become the guardians of all this
– The cult of the DBA
Anything not under the management of the cult doesn't count as being part of the official
'books and records of the firm'
– Or at least that's what they'll tell you
10
11. NOSQL as a hack around corporate governance
Many 'Big Data' tools also fly under the banner of 'NOSQL'
NOSQL allows for the escape from the clutches of the priesthood of storage and the cult of
the DBA
The reason for choosing Cassandra (or whatever) for a project might have nothing to do
with 'Big Data'
Security is often viewed as an optional non functional requirement
– Big Data security controls may be less mature than traditional RDBMS
– So compensating controls must be used for whatever is missing out of the box
– 3rd party tools market still nascent
– So less choice for bolt on security
NOSQL hasn't yet become an integral part of organisation structure/culture
11
13. Simple problems
Low data volume, low algorithm complexity
This is the type of problem that
Problem Types has traditionally worked a
single machine (the database
server) really hard.
Big Data • Reliability has always been a
Data Volume
concern for single box designs
Quant (though this is a solved problem
where synchronous replication is
used).
Simple • This is what makes SAN
attractive
• No special considerations for
Algorithm Complexity network and storage
13
14. Quant Problems – the easy part
Any data volume, high algorithm complexity
High Performance Compute (HPC)
Problem Types impact is well understood:
• Lots of machines at the optimum
CPU/$ price point
Big Data • Previously optimised for CAPEX
Data Volume
• Present trend is to optimise for
TCO (especially energy)
Quant • No real challenges around storage
or interconnect
Simple HPC • Though some local caching
using a 'data grid' may improve
duty cycle over a pure
Algorithm Complexity stateless design
14
15. Quant Problems – the hard part
Any data volume, high algorithm complexity
Data intensive HPC shifts the focus to
Problem Types interconnect and storage:
• Fast network (>1gB Ethernet) may
Data be needed to get data where it's
Big Data intensive needed
Data Volume
HPC • 10gB Ethernet (or faster)
• Infiniband if latency is an issue
Quant • SANs don't work at this scale (and
are too expensive anyway)
Simple • Data needs to be sharded
across inexpensive local discs
Algorithm Complexity
15
16. Big Data Problems – look easy now
High data volume, low algorithm complexity
Problem Types
Typically less demanding on
interconnect than data intensive
Big Data HPC workloads:
• Ethernet likely to be sufficient
Data Volume
Many things that wear the 'big
Quant data' label are in fact solutions
for sharding large data sets
Simple across inexpensive local disc
• E.g. This is what the Hadoop
Distributed File System (HDFS)
Algorithm Complexity does
16
17. The role of SSD
At least for the time being this is a delicate balance between capacity and speed
Applications that become I/O bound with traditional disc need to make a value judgement
on scaling the storage element (switch to SSD) versus scaling the entire solution (buy
more servers and electricity).
– Falling prices will tilt balance towards SSD
Worth noting that many traditional databases will now fit into RAM (especially if spread
across a number of machines), which leaves an emerging SSD sweet spot across the
middle of the chart.
Attention needs to be paid to the 'impedance mismatch' between contemporary workloads
(like Cassandra) and contemporary storage (like SSD). This is not handled well by
decades old file systems (and for a long time the RDBMS vendors have cheated by having
their own file systems).
SSD will hit the feature size scaling wall at the same time as CPU
– Spinning disc (and other technologies will not)
– Enjoy the ride whilst it lasts (perhaps not too much longer)
– Interesting things will happen when things we've become accustomed to having
exponential growth flatten out whilst other growth curves continue
17
18. The future of block storage
SAN/NAS stops being a category in its own right and becomes part of the software
defined data centre
– SAN (and especially dedicated fibre channel networks) goes away altogether
– NAS folds into the commodity server space – looks like DAS at the hardware layer but
behaves like NAS from a software perspective
– Dedicated puddles of software defined storage will be aligned to 'big data', but the overall
capacity management should ultimately be defined by the first exhausted commodity (CPU,
RAM, I/O, disc)
18
19. Data Centre impact - Summary
> Simple energy efficient servers
With local disk
< Big boxes
Connected to SAN
Everything looks the same (less diversity in hardware)
Everything uses the minimum possible energy
'Big Data' is a part of the overall capacity management problem
Data centre automation will solve for optimal equipment/energy use
19
21. Conclusions
'Big Data' is a label that used to describe an emerging category of tools that are useful for
problems with large data volume and low algorithmic complexity
The technical and organisational means to provide security and governance for these
tools are less mature than for traditional databases
Data centres will fill up with more low end servers using local storage (and these will likely
be the designs emerging from hyperscale operators that are optimised for manufacturing
and energy efficiency)
21