SlideShare a Scribd company logo
Data Works
Adrian Waddy , Nick Vaughan and Eloise Hindes
The Building Blocks of Big
Data
TheBankofEngland'sjourneytodeliveringa
bigdatacapability
Agenda
1. The Bank
Who are we, and what we do
3. Data Warehouse
Initial progress
2. Historic IT
Where are we starting from?
4. Hub 1 and 2
First adventures in Big Data
Any questions5. The Future
Where next? Scaling up
The Bank
Who we are and what we do
3
The Bank
4
“Arguably we are
now the most
powerful,
unelected
institution our
country has ever
seen. We need to
respond to that
by becoming
more open, more
accountable and
more
transparent.”
Spencer Dale
The Bank
5
1694:
‘Promote the publick Good
and Benefit of our
People…’
Current:
‘Promoting the good of the
people of the United
Kingdom by maintaining
monetary and financial
stability’
The Bank
6
The Bank
7
Operations RegulationPolicy
The Bank
8
Historic IT
Where are we starting from
9
Historic IT
10
1980s-1990s
Historic IT
11
1990s-2000s
Data Warehouse
Initial Progress
12
Data Warehouse
13
Growing
Demand
Automated
processing
High
availability
Improved
capabilities
Data Warehouse
14
• Affordable scaling
• Less silos
• Significant volumes
Data Warehouse
15
• Given that:
• A step change in
capability was
realised
• The progress made
could only be
described as a
success
• Why the need for a
change of direction?
Data Warehouse
16
Operations
Complexity of
estate
Regulation
EMIR
Policy
Changing Nature of
Roles
Data Warehouse
17
• Data is being stored in databases,
shared drives and a document
management solution - difficult to
search, retrieve, combine and analyse
data
• Many individuals are reliant on their
experience and internal network to
determine what data exists
• Analytical communities in the Bank
would like to collaborate more and to
use new tools and techniques that are
becoming standard in highly analytical
data environments
• Not all individuals have access to the
right tools or environment to be able to
run analysis
Data Warehouse
18
• The nature of Economic publications
were gradually moving from qualitative
to quantitative through the second half
of the 20th century
• In the 21st century and in particular in
response to the Financial Crisis there
was a marked acceleration in this
process
• The variety of mathematical and
statistical operations increasingly
appearing in Economics publications
need data on which to operate!
http://www.istl.org/12-fall/refereed4.html
European Market Infrastructure Regulation (EMIR)
• European Parliament & Council of the EU
• Implementation of G20 commitment
• Risk management regulation
• Avoidance of systemic risk
• Reduce likelihood and severity of future shocks
• Applies to…
• Over-the-counter derivatives (OTC) *
• Central counterparties (CCP)
• Trade Repositories (TR)
19
• What this meant for the Bank of England
• Oversight of OTC & exchange trades
• For UK entities supervised by the PRA
• 85 million transactions from 6 TRs
• 80 files of varying schemas (up to 20gb per file)
• 200+ columns per file
• A new data architecture to collect, store and process!
* $595 trillion market – Bank of International Settlements data end of June 2018
Central Banks & Granular Data – 2013
20
• ‘The Future of Regulatory Data and Analytics’
• A new data strategy?
• Micro-prudential data with macro–financial statistics?
• Storing and making use of granular datasets?
• Can heterogeneous data be harmonised?
• Who pays the costs for larger, faster and more accurate data?
• Individual privacy vs public transparency?
• Prudential Regulation Authority
• A new legal subsidiary of the BoE
• Supervisory & regulatory responsibilities
• Promote the safety & soundness of regulated firms
• Contribute to securing protection for policyholders
• A requirement to collect, store and process more data
Centre for Central Banking Studies – July 2014
• ‘Big Data and Central Banks’
• Diversification of data sources
• Legalities of enabling / constraining scope of granular data collections
• Development of inductive analytical approaches
• Advancement of data analysis capabilities, ML & AI
• Open Source tooling
• Importance of ‘Big Data’ to Central Banks in the years ahead
21
• Could Big Data…..
• Change the way that central banks operate?
• Transform how financial firms and other economic agents do business?
• Change the economy in ways that impact monetary and financial stability?
• Have implications for economic growth and employment?
https://www.aboveallimages.co.uk/wp-content/gallery/london/london_07.jpg
Bank of England Strategic Review – ‘One Mission, One Bank’
22
• ‘One Bank Data Architecture’
• Ability to share data across the Bank
• Reduce data silos
• Reduce the numbers of systems
• Improve discoverability
• Improve analytical capabilities via shared tooling
• Support genuine Big Data use cases
• Strategic data themes
• Management [Governance & Security]
• Collaboration [Sharing of Data]
• Standardisation [More robust processing]
• Exploitation [Tooling for gaining data insight]
Stage 1: The Appliance / Data Hub 1…
23
24
Landing Zone Raw Zone
DTCC zip
x20
UnaVista zip
X12
CME zip
x8
ICE zip
x9
RegisTR zip
x9
RefinedZone
ConsumeZone
StructuredZone
csvzip unzipFTP
csvzip unzipFTP
csvzip unzipFTP
csvzip unzipFTP
csvzip unzipFTP
Source file format will change, although
change will not affect the ingestion and
unzip processes on the Raw Zone Stores historical data of source files in HDFS in its raw uncompressed format
Description
FTP process to load zip files into Data Hub cluster
Keep existing process that moves zip files, provided by the business, in the Landing Zone, into the Raw Zone.
Unzip process to extract raw data files
Keep existing process that unzips files to its raw format. The unzipped csv file is placed temporarily in a hdfs
directory. An external Hive table is created at this directory allowing the csv file to be queried using Hive or SparkSQL.
At the end of the process, this file is removed.
1
2
1 2
• Standard ETL process within market best practices for loading and
storage of data in its raw format
• N/A
LimitationsBenefits
Low Level Design
Raw Zone
Structured ZoneRaw Zone
csv
csv
csv
csv
csv
orc
orcorc
orcorc
orcorc
orcorc
orcorc
3 4
RefinedZone
ConsumeZone
3 Spark jobs that insert each source file into individual structured file table
Direct data ingestion from source file into a ORC Hive table. Each TR file data is ingested into a different structured ORC
table avoiding any mapping at this stage. Having one table per file also adds flexibility to the process, in terms of change
requests (changes are limited to the specific table and mapping rules to mapping table if a file added or an existing is
altered) and reprocessing workflow (only required to run partition of given file until mapping stage, reducing overall
workload).
• Allows easy access to the raw data, without any changes to it’s
underlying structure or format, with efficient compression for
storage and query efficiency
• Having individual tables for each file simplifies mapping process and
diminishes reprocessing workload
• File sizes on tables will be suboptimal,
although mitigated by the simplicity of
the mapping process and flexibility to
schema changes
LimitationsBenefits
4 Spark jobs that map each source file schema to a normalized schema for state information
Simplify mapping process, on both query complexity and performance axioms, by having individual spark mapping jobs to
a normalized state TR schema, both on table structure and on data types.
Table name Storage
format
Partitions Data sorted by Description
**_**_****_****_****** ORC year, month,
day
- One ORC table per TR,
file and version that
stores data in Hive
without columns
mapping
********_***** ORC year, month,
day, filetype
- One ORC table for state
TR data to store mapped
columns in a normalized
schema
25
Converts raw files into ORC and applies data type conversion and mapping rules to store information on a single table
Description
Low Level Design
Structured Zone
Structured
Zone
Raw
Zone
Refined ZoneStructured
Zone
orc orc
Landing Zone
zip
EXTRNAL DATA
ConsumeZone
5
6
Extracts external data source’s in
order to enrich and validate TR
data, maintaining historical data
for reprocessing purposes
Table name Storage
format
Partitions Data sorted by Description
****_****_***** ORC year, month,
day
assetclass,
counterparty
Stores TR data enriched
with external data
sources and additional
columns calculated
based on business rules.
These columns include
the de-duplication rule
set.
5 Load external data sources
Process that loads, unzips and inserts external data into Hive tables to use on data preparation step.
orcorc
Raw
Zone
Landing Zone
TRStateDataExternalReferenceData
6 Spark job that applies business rules and enriches source data with external table information
Calculate additional business columns and enrich with external reference data. Apply the de-duplication rule set and
Contract Continuity specifics
Creates materialized views for business consumption that is optimized for system performance
Description
26
• Centralized table that aggregates all TR state information on a single
point of access
• Segregation of concepts by calculating of business logic rules and
enrichment of source data with external sources on a separate layer
• Late arrival of files require a
reprocessing of daily partition
• Changes in business transformation
requirements require reprocessing of
the full table
LimitationsBenefits
Low Level Design
Refined Zone
Consume Zone
RawZone
StructuredZone
Refined Zone
orc orc
**_*****_****_*****_***_****
orc
**_*****_****_*****_***_****
Table name Storage
format
Partitions Data sorted by Use cases
**_*****_****_*****_***_**** ORC year, month,
day, assetclass
otc_or_etd, c1, c2 *****
*****
Contractual Continuity
*****
**_*****_****_*****_***_**** ORC year, month,
day
c1, c2 *****
**_*****_****_***** ORC year, month,
assetclass
c1, c2 Monthly time series
7
7 Spark job that creates materialized views physically optimized for standard in-house entry points of analysis
Replicates data in Refined Zone into the Consume Zone, with optimized technical partitions, to allow fast performance
while querying and data exploration based on different use cases of analysis. Process can be easily replicated to
accommodate different use cases by creating new partition keys.
Creates materialized views for business consumption that is optimized for system performance
orc
**_*****_****_*****
Description
• Captures generic entry points of analytical analysis
• Optimized to accommodate different analytical workloads based on
requirements
• Improves query performance due to physical partitioning of data
• Duplication of data and onus of
assessing the correct materialised view
is on the user. This could be mitigated
by including a OLAP cube, such as
Apache Druid
LimitationsBenefits
27
Low Level Design
Consume Zone
28
EMIR Trade Repositories framework
Landing
Zone
Structured ZoneRaw Zone Refined Zone Consume Zone
Data Governance
orc
mappings
orcorc orc
orc
orc
orc
TR DATA
zip csvzip
TRStateData
orcorczip csv
Reference data
29
EMIR
EMIR Project benefits for the wider Data Programme
Designed to set the right path for the Data Programme in 4 key aspects, aligned with the
One Bank Value:
Set the right technical
architecture to serve as a
standard for BoE Big Data
projects
Provide the drivers for a more
self-service Operating Model
Pair programming sessions for
on-the-job training and
coaching
Deliver a Data Quality and
plausibility Management
solution to be used across the
Data Programme
Architecture Self-service TOMData Science skills Data Plausibility
30
Demonstrate Data Science knowledge can be
upskilled
Sr. Data Scientist will deliver on-the-job training and coaching to FMID in order to upskill the existing team. From
this, we expect users to gain autonomy to develop new data analysis and ad hoc data exploration on existing
datasets in Data Hub.
31
1. How will training be delivered to
business areas?
2. What skills should be centralised
and what should stay in each
business team?
3. Upscale current team skillset or
expand resources?
Questions still open:
Training On the job coaching
Provide core skills and
understand how to use Big Data
tools
Pair programing and advisory work to
provide experience using Big Data
tools with R
How can Data Science skills be attained?
Data Hub 2
32
Automation Dynamic
Provisioning
Flexibility
Data Hub 2
33
Data Hub 2
34
• VMWare VxRack HCI offering
• EMC’s Isilon storage
• 392 cores per site, and circa 10 TB
RAM
• 320 TB of “usable” storage
• Storage: The equivalent to 7500
standard iphone Xs (1.32 tonnes of
iphones!!)
• Processing: The equivalent “cores”
as 84 standard iphone Xs
• Memory: The equivalent RAM as
4608 standard iphone Xs (a pile of
phones 35 ½ metres high)
Data Hub 2
35
Lessons Learned
36
People and Processes
Technology
Governance and
MetaData
Creativity
Tenacity
Experience
Questions

More Related Content

What's hot

Bridging the gap: achieving fast data synchronization from SAP HANA by levera...
Bridging the gap: achieving fast data synchronization from SAP HANA by levera...Bridging the gap: achieving fast data synchronization from SAP HANA by levera...
Bridging the gap: achieving fast data synchronization from SAP HANA by levera...
DataWorks Summit
 
Understanding Your Crown Jewels: Finding, Organizing, and Profiling Sensitive...
Understanding Your Crown Jewels: Finding, Organizing, and Profiling Sensitive...Understanding Your Crown Jewels: Finding, Organizing, and Profiling Sensitive...
Understanding Your Crown Jewels: Finding, Organizing, and Profiling Sensitive...
DataWorks Summit
 
Securing and governing a multi-tenant data lake within the financial industry
Securing and governing a multi-tenant data lake within the financial industrySecuring and governing a multi-tenant data lake within the financial industry
Securing and governing a multi-tenant data lake within the financial industry
DataWorks Summit
 
Journey to Big Data: Main Issues, Solutions, Benefits
Journey to Big Data: Main Issues, Solutions, BenefitsJourney to Big Data: Main Issues, Solutions, Benefits
Journey to Big Data: Main Issues, Solutions, Benefits
DataWorks Summit
 
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
DataWorks Summit/Hadoop Summit
 
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...
Seeling Cheung
 
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...
DataWorks Summit
 
The Curse of the Data Lake Monster
The Curse of the Data Lake MonsterThe Curse of the Data Lake Monster
The Curse of the Data Lake Monster
Thoughtworks
 
Continuous Data Ingestion pipeline for the Enterprise
Continuous Data Ingestion pipeline for the EnterpriseContinuous Data Ingestion pipeline for the Enterprise
Continuous Data Ingestion pipeline for the Enterprise
DataWorks Summit
 
The rise of big data governance: insight on this emerging trend from active o...
The rise of big data governance: insight on this emerging trend from active o...The rise of big data governance: insight on this emerging trend from active o...
The rise of big data governance: insight on this emerging trend from active o...
DataWorks Summit
 
Big Data & Oracle Technologies
Big Data & Oracle TechnologiesBig Data & Oracle Technologies
Big Data & Oracle Technologies
Oleksii Movchaniuk
 
Building intelligent applications, experimental ML with Uber’s Data Science W...
Building intelligent applications, experimental ML with Uber’s Data Science W...Building intelligent applications, experimental ML with Uber’s Data Science W...
Building intelligent applications, experimental ML with Uber’s Data Science W...
DataWorks Summit
 
ING's Customer-Centric Data Journey from Community Idea to Private Cloud Depl...
ING's Customer-Centric Data Journey from Community Idea to Private Cloud Depl...ING's Customer-Centric Data Journey from Community Idea to Private Cloud Depl...
ING's Customer-Centric Data Journey from Community Idea to Private Cloud Depl...
DataWorks Summit/Hadoop Summit
 
HDFS tiered storage: mounting object stores in HDFS
HDFS tiered storage: mounting object stores in HDFSHDFS tiered storage: mounting object stores in HDFS
HDFS tiered storage: mounting object stores in HDFS
DataWorks Summit
 
Pouring the Foundation: Data Management in the Energy Industry
Pouring the Foundation: Data Management in the Energy IndustryPouring the Foundation: Data Management in the Energy Industry
Pouring the Foundation: Data Management in the Energy Industry
DataWorks Summit
 
Munich Re: Driving a Big Data Transformation
Munich Re: Driving a Big Data TransformationMunich Re: Driving a Big Data Transformation
Munich Re: Driving a Big Data Transformation
DataWorks Summit
 
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...
DataWorks Summit
 
Inside open metadata—the deep dive
Inside open metadata—the deep diveInside open metadata—the deep dive
Inside open metadata—the deep dive
DataWorks Summit
 
The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...
The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...
The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...
DataWorks Summit
 
Open Source in the Energy Industry - Creating a New Operational Model for Dat...
Open Source in the Energy Industry - Creating a New Operational Model for Dat...Open Source in the Energy Industry - Creating a New Operational Model for Dat...
Open Source in the Energy Industry - Creating a New Operational Model for Dat...
DataWorks Summit
 

What's hot (20)

Bridging the gap: achieving fast data synchronization from SAP HANA by levera...
Bridging the gap: achieving fast data synchronization from SAP HANA by levera...Bridging the gap: achieving fast data synchronization from SAP HANA by levera...
Bridging the gap: achieving fast data synchronization from SAP HANA by levera...
 
Understanding Your Crown Jewels: Finding, Organizing, and Profiling Sensitive...
Understanding Your Crown Jewels: Finding, Organizing, and Profiling Sensitive...Understanding Your Crown Jewels: Finding, Organizing, and Profiling Sensitive...
Understanding Your Crown Jewels: Finding, Organizing, and Profiling Sensitive...
 
Securing and governing a multi-tenant data lake within the financial industry
Securing and governing a multi-tenant data lake within the financial industrySecuring and governing a multi-tenant data lake within the financial industry
Securing and governing a multi-tenant data lake within the financial industry
 
Journey to Big Data: Main Issues, Solutions, Benefits
Journey to Big Data: Main Issues, Solutions, BenefitsJourney to Big Data: Main Issues, Solutions, Benefits
Journey to Big Data: Main Issues, Solutions, Benefits
 
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
 
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...
 
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...
 
The Curse of the Data Lake Monster
The Curse of the Data Lake MonsterThe Curse of the Data Lake Monster
The Curse of the Data Lake Monster
 
Continuous Data Ingestion pipeline for the Enterprise
Continuous Data Ingestion pipeline for the EnterpriseContinuous Data Ingestion pipeline for the Enterprise
Continuous Data Ingestion pipeline for the Enterprise
 
The rise of big data governance: insight on this emerging trend from active o...
The rise of big data governance: insight on this emerging trend from active o...The rise of big data governance: insight on this emerging trend from active o...
The rise of big data governance: insight on this emerging trend from active o...
 
Big Data & Oracle Technologies
Big Data & Oracle TechnologiesBig Data & Oracle Technologies
Big Data & Oracle Technologies
 
Building intelligent applications, experimental ML with Uber’s Data Science W...
Building intelligent applications, experimental ML with Uber’s Data Science W...Building intelligent applications, experimental ML with Uber’s Data Science W...
Building intelligent applications, experimental ML with Uber’s Data Science W...
 
ING's Customer-Centric Data Journey from Community Idea to Private Cloud Depl...
ING's Customer-Centric Data Journey from Community Idea to Private Cloud Depl...ING's Customer-Centric Data Journey from Community Idea to Private Cloud Depl...
ING's Customer-Centric Data Journey from Community Idea to Private Cloud Depl...
 
HDFS tiered storage: mounting object stores in HDFS
HDFS tiered storage: mounting object stores in HDFSHDFS tiered storage: mounting object stores in HDFS
HDFS tiered storage: mounting object stores in HDFS
 
Pouring the Foundation: Data Management in the Energy Industry
Pouring the Foundation: Data Management in the Energy IndustryPouring the Foundation: Data Management in the Energy Industry
Pouring the Foundation: Data Management in the Energy Industry
 
Munich Re: Driving a Big Data Transformation
Munich Re: Driving a Big Data TransformationMunich Re: Driving a Big Data Transformation
Munich Re: Driving a Big Data Transformation
 
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...
 
Inside open metadata—the deep dive
Inside open metadata—the deep diveInside open metadata—the deep dive
Inside open metadata—the deep dive
 
The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...
The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...
The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...
 
Open Source in the Energy Industry - Creating a New Operational Model for Dat...
Open Source in the Energy Industry - Creating a New Operational Model for Dat...Open Source in the Energy Industry - Creating a New Operational Model for Dat...
Open Source in the Energy Industry - Creating a New Operational Model for Dat...
 

Similar to Promote the Good of the People of the United Kingdom by Maintaining Monetary and Financial Stability

Data Warehousing 2016
Data Warehousing 2016Data Warehousing 2016
Data Warehousing 2016
Kent Graziano
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
Rishikese MR
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
DATAVERSITY
 
Data Warehousing, Data Mining & Data Visualisation
Data Warehousing, Data Mining & Data VisualisationData Warehousing, Data Mining & Data Visualisation
Data Warehousing, Data Mining & Data Visualisation
Sunderland City Council
 
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALADATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
Saikiran Panjala
 
Using Data Platforms That Are Fit-For-Purpose
Using Data Platforms That Are Fit-For-PurposeUsing Data Platforms That Are Fit-For-Purpose
Using Data Platforms That Are Fit-For-Purpose
DATAVERSITY
 
Modernizing Data Architecture using Data Virtualization for Agile Data Delivery
Modernizing Data Architecture using Data Virtualization for Agile Data DeliveryModernizing Data Architecture using Data Virtualization for Agile Data Delivery
Modernizing Data Architecture using Data Virtualization for Agile Data Delivery
Denodo
 
Best Practices – Extreme Performance with Data Warehousing on Oracle Database
Best Practices – Extreme Performance with Data Warehousing on Oracle DatabaseBest Practices – Extreme Performance with Data Warehousing on Oracle Database
Best Practices – Extreme Performance with Data Warehousing on Oracle Database
Edgar Alejandro Villegas
 
Business intelligence an Overview
Business intelligence an OverviewBusiness intelligence an Overview
Business intelligence an Overview
Zahra Mansoori
 
Unlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationUnlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data Virtualization
Denodo
 
Manish tripathi-ea-dw-bi
Manish tripathi-ea-dw-biManish tripathi-ea-dw-bi
Manish tripathi-ea-dw-bi
A P
 
Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01
Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01
Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01
Soujanya V
 
BDA-Module-1.pptx
BDA-Module-1.pptxBDA-Module-1.pptx
BDA-Module-1.pptx
ASHWIN808488
 
A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)
Denodo
 
MongoBD London 2013: Real World MongoDB: Use Cases from Financial Services pr...
MongoBD London 2013: Real World MongoDB: Use Cases from Financial Services pr...MongoBD London 2013: Real World MongoDB: Use Cases from Financial Services pr...
MongoBD London 2013: Real World MongoDB: Use Cases from Financial Services pr...
MongoDB
 
Introduction to data mining and data warehousing
Introduction to data mining and data warehousingIntroduction to data mining and data warehousing
Introduction to data mining and data warehousing
Er. Nawaraj Bhandari
 
Bridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need ItBridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need It
Denodo
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
DATAVERSITY
 
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
Denodo
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
Denodo
 

Similar to Promote the Good of the People of the United Kingdom by Maintaining Monetary and Financial Stability (20)

Data Warehousing 2016
Data Warehousing 2016Data Warehousing 2016
Data Warehousing 2016
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Data Warehousing, Data Mining & Data Visualisation
Data Warehousing, Data Mining & Data VisualisationData Warehousing, Data Mining & Data Visualisation
Data Warehousing, Data Mining & Data Visualisation
 
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALADATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
 
Using Data Platforms That Are Fit-For-Purpose
Using Data Platforms That Are Fit-For-PurposeUsing Data Platforms That Are Fit-For-Purpose
Using Data Platforms That Are Fit-For-Purpose
 
Modernizing Data Architecture using Data Virtualization for Agile Data Delivery
Modernizing Data Architecture using Data Virtualization for Agile Data DeliveryModernizing Data Architecture using Data Virtualization for Agile Data Delivery
Modernizing Data Architecture using Data Virtualization for Agile Data Delivery
 
Best Practices – Extreme Performance with Data Warehousing on Oracle Database
Best Practices – Extreme Performance with Data Warehousing on Oracle DatabaseBest Practices – Extreme Performance with Data Warehousing on Oracle Database
Best Practices – Extreme Performance with Data Warehousing on Oracle Database
 
Business intelligence an Overview
Business intelligence an OverviewBusiness intelligence an Overview
Business intelligence an Overview
 
Unlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationUnlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data Virtualization
 
Manish tripathi-ea-dw-bi
Manish tripathi-ea-dw-biManish tripathi-ea-dw-bi
Manish tripathi-ea-dw-bi
 
Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01
Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01
Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01
 
BDA-Module-1.pptx
BDA-Module-1.pptxBDA-Module-1.pptx
BDA-Module-1.pptx
 
A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)
 
MongoBD London 2013: Real World MongoDB: Use Cases from Financial Services pr...
MongoBD London 2013: Real World MongoDB: Use Cases from Financial Services pr...MongoBD London 2013: Real World MongoDB: Use Cases from Financial Services pr...
MongoBD London 2013: Real World MongoDB: Use Cases from Financial Services pr...
 
Introduction to data mining and data warehousing
Introduction to data mining and data warehousingIntroduction to data mining and data warehousing
Introduction to data mining and data warehousing
 
Bridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need ItBridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need It
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
 
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
 

More from DataWorks Summit

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Zilliz
 
20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
Pixlogix Infotech
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 

Recently uploaded (20)

TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
 
20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 

Promote the Good of the People of the United Kingdom by Maintaining Monetary and Financial Stability

  • 1. Data Works Adrian Waddy , Nick Vaughan and Eloise Hindes The Building Blocks of Big Data TheBankofEngland'sjourneytodeliveringa bigdatacapability
  • 2. Agenda 1. The Bank Who are we, and what we do 3. Data Warehouse Initial progress 2. Historic IT Where are we starting from? 4. Hub 1 and 2 First adventures in Big Data Any questions5. The Future Where next? Scaling up
  • 3. The Bank Who we are and what we do 3
  • 4. The Bank 4 “Arguably we are now the most powerful, unelected institution our country has ever seen. We need to respond to that by becoming more open, more accountable and more transparent.” Spencer Dale
  • 5. The Bank 5 1694: ‘Promote the publick Good and Benefit of our People…’ Current: ‘Promoting the good of the people of the United Kingdom by maintaining monetary and financial stability’
  • 9. Historic IT Where are we starting from 9
  • 14. Data Warehouse 14 • Affordable scaling • Less silos • Significant volumes
  • 15. Data Warehouse 15 • Given that: • A step change in capability was realised • The progress made could only be described as a success • Why the need for a change of direction?
  • 17. Data Warehouse 17 • Data is being stored in databases, shared drives and a document management solution - difficult to search, retrieve, combine and analyse data • Many individuals are reliant on their experience and internal network to determine what data exists • Analytical communities in the Bank would like to collaborate more and to use new tools and techniques that are becoming standard in highly analytical data environments • Not all individuals have access to the right tools or environment to be able to run analysis
  • 18. Data Warehouse 18 • The nature of Economic publications were gradually moving from qualitative to quantitative through the second half of the 20th century • In the 21st century and in particular in response to the Financial Crisis there was a marked acceleration in this process • The variety of mathematical and statistical operations increasingly appearing in Economics publications need data on which to operate! http://www.istl.org/12-fall/refereed4.html
  • 19. European Market Infrastructure Regulation (EMIR) • European Parliament & Council of the EU • Implementation of G20 commitment • Risk management regulation • Avoidance of systemic risk • Reduce likelihood and severity of future shocks • Applies to… • Over-the-counter derivatives (OTC) * • Central counterparties (CCP) • Trade Repositories (TR) 19 • What this meant for the Bank of England • Oversight of OTC & exchange trades • For UK entities supervised by the PRA • 85 million transactions from 6 TRs • 80 files of varying schemas (up to 20gb per file) • 200+ columns per file • A new data architecture to collect, store and process! * $595 trillion market – Bank of International Settlements data end of June 2018
  • 20. Central Banks & Granular Data – 2013 20 • ‘The Future of Regulatory Data and Analytics’ • A new data strategy? • Micro-prudential data with macro–financial statistics? • Storing and making use of granular datasets? • Can heterogeneous data be harmonised? • Who pays the costs for larger, faster and more accurate data? • Individual privacy vs public transparency? • Prudential Regulation Authority • A new legal subsidiary of the BoE • Supervisory & regulatory responsibilities • Promote the safety & soundness of regulated firms • Contribute to securing protection for policyholders • A requirement to collect, store and process more data
  • 21. Centre for Central Banking Studies – July 2014 • ‘Big Data and Central Banks’ • Diversification of data sources • Legalities of enabling / constraining scope of granular data collections • Development of inductive analytical approaches • Advancement of data analysis capabilities, ML & AI • Open Source tooling • Importance of ‘Big Data’ to Central Banks in the years ahead 21 • Could Big Data….. • Change the way that central banks operate? • Transform how financial firms and other economic agents do business? • Change the economy in ways that impact monetary and financial stability? • Have implications for economic growth and employment? https://www.aboveallimages.co.uk/wp-content/gallery/london/london_07.jpg
  • 22. Bank of England Strategic Review – ‘One Mission, One Bank’ 22 • ‘One Bank Data Architecture’ • Ability to share data across the Bank • Reduce data silos • Reduce the numbers of systems • Improve discoverability • Improve analytical capabilities via shared tooling • Support genuine Big Data use cases • Strategic data themes • Management [Governance & Security] • Collaboration [Sharing of Data] • Standardisation [More robust processing] • Exploitation [Tooling for gaining data insight]
  • 23. Stage 1: The Appliance / Data Hub 1… 23
  • 24. 24 Landing Zone Raw Zone DTCC zip x20 UnaVista zip X12 CME zip x8 ICE zip x9 RegisTR zip x9 RefinedZone ConsumeZone StructuredZone csvzip unzipFTP csvzip unzipFTP csvzip unzipFTP csvzip unzipFTP csvzip unzipFTP Source file format will change, although change will not affect the ingestion and unzip processes on the Raw Zone Stores historical data of source files in HDFS in its raw uncompressed format Description FTP process to load zip files into Data Hub cluster Keep existing process that moves zip files, provided by the business, in the Landing Zone, into the Raw Zone. Unzip process to extract raw data files Keep existing process that unzips files to its raw format. The unzipped csv file is placed temporarily in a hdfs directory. An external Hive table is created at this directory allowing the csv file to be queried using Hive or SparkSQL. At the end of the process, this file is removed. 1 2 1 2 • Standard ETL process within market best practices for loading and storage of data in its raw format • N/A LimitationsBenefits Low Level Design Raw Zone
  • 25. Structured ZoneRaw Zone csv csv csv csv csv orc orcorc orcorc orcorc orcorc orcorc 3 4 RefinedZone ConsumeZone 3 Spark jobs that insert each source file into individual structured file table Direct data ingestion from source file into a ORC Hive table. Each TR file data is ingested into a different structured ORC table avoiding any mapping at this stage. Having one table per file also adds flexibility to the process, in terms of change requests (changes are limited to the specific table and mapping rules to mapping table if a file added or an existing is altered) and reprocessing workflow (only required to run partition of given file until mapping stage, reducing overall workload). • Allows easy access to the raw data, without any changes to it’s underlying structure or format, with efficient compression for storage and query efficiency • Having individual tables for each file simplifies mapping process and diminishes reprocessing workload • File sizes on tables will be suboptimal, although mitigated by the simplicity of the mapping process and flexibility to schema changes LimitationsBenefits 4 Spark jobs that map each source file schema to a normalized schema for state information Simplify mapping process, on both query complexity and performance axioms, by having individual spark mapping jobs to a normalized state TR schema, both on table structure and on data types. Table name Storage format Partitions Data sorted by Description **_**_****_****_****** ORC year, month, day - One ORC table per TR, file and version that stores data in Hive without columns mapping ********_***** ORC year, month, day, filetype - One ORC table for state TR data to store mapped columns in a normalized schema 25 Converts raw files into ORC and applies data type conversion and mapping rules to store information on a single table Description Low Level Design Structured Zone
  • 26. Structured Zone Raw Zone Refined ZoneStructured Zone orc orc Landing Zone zip EXTRNAL DATA ConsumeZone 5 6 Extracts external data source’s in order to enrich and validate TR data, maintaining historical data for reprocessing purposes Table name Storage format Partitions Data sorted by Description ****_****_***** ORC year, month, day assetclass, counterparty Stores TR data enriched with external data sources and additional columns calculated based on business rules. These columns include the de-duplication rule set. 5 Load external data sources Process that loads, unzips and inserts external data into Hive tables to use on data preparation step. orcorc Raw Zone Landing Zone TRStateDataExternalReferenceData 6 Spark job that applies business rules and enriches source data with external table information Calculate additional business columns and enrich with external reference data. Apply the de-duplication rule set and Contract Continuity specifics Creates materialized views for business consumption that is optimized for system performance Description 26 • Centralized table that aggregates all TR state information on a single point of access • Segregation of concepts by calculating of business logic rules and enrichment of source data with external sources on a separate layer • Late arrival of files require a reprocessing of daily partition • Changes in business transformation requirements require reprocessing of the full table LimitationsBenefits Low Level Design Refined Zone
  • 27. Consume Zone RawZone StructuredZone Refined Zone orc orc **_*****_****_*****_***_**** orc **_*****_****_*****_***_**** Table name Storage format Partitions Data sorted by Use cases **_*****_****_*****_***_**** ORC year, month, day, assetclass otc_or_etd, c1, c2 ***** ***** Contractual Continuity ***** **_*****_****_*****_***_**** ORC year, month, day c1, c2 ***** **_*****_****_***** ORC year, month, assetclass c1, c2 Monthly time series 7 7 Spark job that creates materialized views physically optimized for standard in-house entry points of analysis Replicates data in Refined Zone into the Consume Zone, with optimized technical partitions, to allow fast performance while querying and data exploration based on different use cases of analysis. Process can be easily replicated to accommodate different use cases by creating new partition keys. Creates materialized views for business consumption that is optimized for system performance orc **_*****_****_***** Description • Captures generic entry points of analytical analysis • Optimized to accommodate different analytical workloads based on requirements • Improves query performance due to physical partitioning of data • Duplication of data and onus of assessing the correct materialised view is on the user. This could be mitigated by including a OLAP cube, such as Apache Druid LimitationsBenefits 27 Low Level Design Consume Zone
  • 28. 28 EMIR Trade Repositories framework Landing Zone Structured ZoneRaw Zone Refined Zone Consume Zone Data Governance orc mappings orcorc orc orc orc orc TR DATA zip csvzip TRStateData orcorczip csv Reference data
  • 30. EMIR Project benefits for the wider Data Programme Designed to set the right path for the Data Programme in 4 key aspects, aligned with the One Bank Value: Set the right technical architecture to serve as a standard for BoE Big Data projects Provide the drivers for a more self-service Operating Model Pair programming sessions for on-the-job training and coaching Deliver a Data Quality and plausibility Management solution to be used across the Data Programme Architecture Self-service TOMData Science skills Data Plausibility 30
  • 31. Demonstrate Data Science knowledge can be upskilled Sr. Data Scientist will deliver on-the-job training and coaching to FMID in order to upskill the existing team. From this, we expect users to gain autonomy to develop new data analysis and ad hoc data exploration on existing datasets in Data Hub. 31 1. How will training be delivered to business areas? 2. What skills should be centralised and what should stay in each business team? 3. Upscale current team skillset or expand resources? Questions still open: Training On the job coaching Provide core skills and understand how to use Big Data tools Pair programing and advisory work to provide experience using Big Data tools with R How can Data Science skills be attained?
  • 32. Data Hub 2 32 Automation Dynamic Provisioning Flexibility
  • 34. Data Hub 2 34 • VMWare VxRack HCI offering • EMC’s Isilon storage • 392 cores per site, and circa 10 TB RAM • 320 TB of “usable” storage • Storage: The equivalent to 7500 standard iphone Xs (1.32 tonnes of iphones!!) • Processing: The equivalent “cores” as 84 standard iphone Xs • Memory: The equivalent RAM as 4608 standard iphone Xs (a pile of phones 35 ½ metres high)
  • 36. Lessons Learned 36 People and Processes Technology Governance and MetaData Creativity Tenacity Experience