SlideShare a Scribd company logo
1 of 17
Big Data opporunities for Banks,
Providers & Intermediaries and the
role of their Software Vendor
by Mark Pearce1
1
Avelo, Cheltenham, England, UK
Table of Contents
Abstract..............................................................................................................................................................3
Who is Avelo?....................................................................................................................................................3
What is Big Data?..............................................................................................................................................4
Business Benefits of Big Data...........................................................................................................................4
Big Data within Financial Services....................................................................................................................7
The Avelo Big Data project...............................................................................................................................8
Constructing a Big Data capability at Avelo...................................................................................................10
What is Avelo’s protection gap Big Data offering?.........................................................................................11
What other Big Data offerings could Avelo explore?.....................................................................................12
Summary..........................................................................................................................................................13
References........................................................................................................................................................13
Glossary...........................................................................................................................................................14
Abstract
Avelo is a UK financial software vendor that has an active program underway to integrate its applications;
at the same time it’s also researching how it can leverage Big Data. The financial services market is shifting
from an emphasis on process to an emphasis on data. Customers are no longer willing to accept silo
applications and instead expect applications that plug and play together with integrated data to satisfy
user needs. Avelo is focused on unifying data, frameworks, functionality and presentation to create a
platform built on common patterns and practices. This paper examines how Avelo could implement Big
Data projects to unlock additional value for its clients. We believe that many of Avelo’s experiences will be
informative to readers in other companies.
Who is Avelo?
Avelo is a UK financial software vendor that is a fusion of formerly separate but highly successful niche
companies. As a consequence of this heritage, Avelo products have varied data representations, a mix of
approaches to functionality and different presentations. The products cover a wide spectrum of usage,
serving individual consumers, intermediaries, small financial institutions, networks of financial advisors,
large insurers and large banks.
Most of Avelo’s software is high-value operational software that performs day-to-day business processing.
However, Avelo is also building and investigating analytical services that mine the data obtained from its
products to provide insights into the behaviour of its customers’ customers. Avelo’s current business is
focused on the UK, but the company is looking to expand its presence into continental Europe, the US, and
beyond.
Avelo has applications that are leaders in the UK market, such as the following:
o Adviser Office. This software is used by more than a thousand adviser firms offering wealth
management and financial advice, both multi-tiered and independent. Adviser Office integrates
and links with over 70 partners, including product providers, portals and fund supermarkets to
aggregate client data and avoid data re-keying.
o Exchange Portal. The Exchange Portal is the largest provider of online life and pension quotations
in the UK. The Portal has over 30,000 registered users. It provides online information and
transaction services, with over a third of a billion client quotations processed between March 2008
and March 2011.
o Avelo Trigold. Around two thirds of the UK’s mortgage advisors use the software to research and
apply for mortgages on their customers’ behalf.
o Mortgage Sales & Originations. It is estimated that one in four UK mortgages currently touch
Avelo’s point-of-sale and originations systems.
Even though Avelo’s applications are thriving as individual successes, the applications are separate and do
not interoperate beyond customer-specific integrations. Avelo is now developing a common enterprise
data model so that it can align its applications and support them with a single strategic platform.
What is Big Data?
Big Data is a collection of data sets so large and complex that it becomes difficult to process using on-hand
database management tools or traditional data processing applications. The challenges include cost,
capture, storage, search, sharing, transfer, analysis, visualization and return-on-investment. As of 2012,
limits on the size of data sets that are feasible to process in a reasonable amount of time were in the order
of Exabytes (see Glossary) of data. Data sets are growing in size because they are being gathered by
ubiquitous information hungry mobile devices, global social media sites, increasingly sophisticated
consumer facing company websites and swathes of Business to Business (B2B) and Business to Consumer
(B2C) Business Intelligence (BI) processes.
Big data is difficult to work with using most relational database management systems and desktop
statistical software and visualization packages as it requires huge amounts of parallel software running on
tens, hundreds, or even thousands of servers.
However, Big Data is more than simply a matter of size; it is an opportunity to find insights in new and
emerging types of data and content, to make your business more agile and to answer questions that were
previously considered beyond your reach. Until now, there was no practical way to harvest this
opportunity.
Business Benefits of Big Data
There are compelling business reasons for developing Big Data analytical capabilities:
o Performance Management
Performance management involves understanding the meaning of data in company databases
using pre-determined queries and multidimensional analysis. The data used for this analysis are
transactional, for example, years of customer purchasing activity or inventory levels and turnover.
Managers can ask questions such as which are the most profitable customer segments and get
answers in real-time that can be used to help make short-term business decisions and longer term
plans. The main challenge is to ensure the quality and completeness of transactions entered into
the system or the result will be “garbage in, garbage out.” Also, to guarantee a complete picture of
the business, multiple databases across functions have to be integrated.
o Data Exploration
Data exploration makes heavy use of statistics to experiment and get answers to questions that
managers might not have thought of previously. This approach leverages predictive modelling
techniques to predict user behaviour based on their previous business transactions and
preferences. Cluster analysis (see Glossary) can be used to segment customers into groups based
on similar attributes. Once these groups are discovered, managers can perform targeted actions
such as customizing marketing messages, upgrading services and cross/ up-selling to each unique
group. Another popular use case is to predict what group of users may “drop out.” Armed with
this information, managers can proactively devise strategies to retain this user segment and lower
the churn rate. With an increased emphasis on digital, inbound marketing, organizations want to
attract prospects to their website with engaging, robust, and targeted content. By running
experiments on these groups managers can predict which combination of variables will lead to the
highest conversion rate of site visitors to qualified leads and qualified leads to customers.
Target, the large US retailer, used data mining techniques to predict the buying habits of clusters
of customers that were going through a major life event. Predicting customers who are going
through big life changes such as pregnancy, marriage, and divorce, is important to retailers since
these customers are most likely to be flexible and change their buying habits, making them ideal
targets for advertisers. Target was able to identify roughly 25 products, such as unscented lotion
and vitamin supplements, that when analysed together, helped determine a “pregnancy
prediction” score. Target then sent promotions focused on baby-related products to women based
on their pregnancy prediction score. The result: sales of Target’s Mum and Baby products sharply
increased soon after the launch of their new advertising campaigns. Target had to adjust how it
communicated this promotion to women who were most likely pregnant, once it had learned that
the initial advertising had made some of them upset. As a result, Target made sure to include
advertisements that were not baby-related so the baby ads would look random.
o Social Analytics
Social analytics measure the vast amount of non-transactional data that exists today. Much of this
data exist on social media platforms, such as conversations and reviews on Facebook, Twitter, and
Google+. Social analytics measure three broad categories: awareness, engagement, and word-of-
mouth or reach.
Awareness looks at the exposure or mentions of social content and often involves metrics such as
the number of video views and the number of followers or community members.
Engagement measures the level of activity and interaction among platform members, such as the
frequency of user-generated content. More recently, mobile applications and platforms such as
Foursquare (www.foursquare.com) provide organizations with location-based data that can
measure brand awareness and engagement, including the number and frequency of check-ins, with
active users rewarded with badges.
Reach measures the extent to which content is disseminated to other users across social platforms.
Reach can be measured with variables such as the number of re-tweets on Twitter and shared likes
on Facebook.
Social metrics are critical since they help inform managers of the success of their external and
internal social digital campaigns and activities. For example, marketing campaigns involving
contests and promotions on Facebook can be assessed through the number of consumer ideas
submitted and the community comments related to those ideas. If the metrics indicate poor
results, managers can pivot and make changes.
With recent advancements in social measurement techniques, we can now calculate one’s “digital
footprint” in the social media world. Companies like PeerIndex (www.peerindex.xom) and Klout
(www.klout.com) can measure a digital user’s social influence. A Klout score ranges from 1 to 100,
based on their algorithm involving number of followers, re-tweets, the influence of the followers
themselves and other variables. Marketers are using social metrics to identify “influencers,” those
well-followed individuals who are discussing their particular brand and can serve as a brand
advocate.
Using Klout’s services, Virgin America identified 120 individuals with high Klout scores and offered
them a free flight to promote their new Toronto route. These individuals were under no obligation
to write about their experience but between these 120 individuals and another 144 engaged
influencers, the campaign resulted in a total of 4,600 tweets, 7.4 million impressions and coverage
in top news outlets. Thus, the campaign created a high brand awareness of the new airline route.
o Decision Science
Decision science involves experiments and analysis of non-transactional data, such as consumer-
generated product ideas and product reviews, to improve the decision-making process. Unlike
social analysers who focus on social analytics to measure known objectives, decision scientists
explore social big data as a way to conduct “field research” and to test hypotheses. Crowdsourcing,
including idea generation and polling, enables companies to pose questions to the community
about its products and brands. Decision scientists, in conjunction with community feedback,
determine the value, validity, feasibility and fit of these ideas and eventually report on if/ how they
plan to put these ideas in to action.
The My Starbucks Idea program enables consumers to share, vote, and submit ideas regarding
Starbuck’s products, customer experience, and community involvement. Over 100,000 ideas have
been collected to date. Starbucks has an “Ideas in Action” section to discuss where ideas sit in the
review process. Many of the techniques used by decision scientists involve listening tools that
perform text and sentiment analysis. By leveraging these tools, companies can measure specific
topics of interest around its products, as well as who is saying what about these topics. For
example, before a new product is launched, marketers can measure how consumers feel about
price, the impact that demographics may have on sentiment, and how price sentiment changes
over time. Managers can then adjust prices based on these tests.
Whirlpool, the manufacturer of home appliances, in 2009 wanted to discover what their customers
and consumers were saying about their products and services on social media platforms. They
used Attensity360 (www.attensity.com) for continuous monitoring and analysis of conversations
across popular channels such as Facebook, Twitter, and Youtube, review and blogger sites and
mainstream news. Attensity’s text analytics findings were incorporated into Whirlpool’s decision
models to accurately predict customer churn, loyalty, and satisfaction. This process enabled the
company to listen, respond, and measure on a scale unobtainable by manual methods. The results
revealed that Whirlpool improved its understanding of its overall business. There was increased
satisfaction, faster responsiveness, and overall, more satisfied experiences with customers. The
company also incorporated customer feedback to improve its product development and planning
process.
While technology has helped companies scale the listening process involving social Big Data, the
accuracy of listening tools is nowhere near perfect. Manual work is needed to “train” these
technologies on company and industry specific keywords with regard to textual and sentiment
analysis.
With respect to future trends in the Big Data field, the following practice is starting to emerge:
o Integrating multiple big data strategies.
While a company can be effective with a single Big Data strategy, the most effective companies
leveraging Big Data today are combining strategies. For example, one financial institution is
leveraging both Social Analytics (non-transactional social data) and Performance Management
(business intelligence using transactional data) strategies to guide its customer service. The bank
traditionally determined its “top” customers based on metrics such as number and balance of
accounts; these were the customers who received premium service. Now, the bank is planning to
incorporate social metrics into the equation. Those online customers who are very active with
respect to mentioning, engaging with, and promoting the bank on social channels will also be
considered for high-level service programs. The financial institution believes this is a much more
balanced way to segment its most influential customers for customer service.
Big Data within Financial Services
The Financial Services Industry is amongst the most data driven of industries. The regulatory environment
that commercial banks and insurance companies operate within requires these institutions to store and
analyse many years of transaction data. For the most part, financial services firms have relied on relational
technologies coupled with business intelligence tools to handle this ever-increasing data and analytics
burden. It is however increasingly clear that while such technologies will continue to play an integral role,
new technologies –many of them developed in response to the data analytics challenges first faced in e-
commerce, internet search and other industries – have a transformative role in enterprise data
management. The challenge, as outlined in ‘What is Big Data’ above, is not only one of sheer data volumes
but also of data variety and the timeliness with which such varied data needs to be aggregated and
analysed.
As data driven as financial services companies are, analysts estimate that somewhere between 80 and 90
percent of the data that Financial Services firms have is unstructured, i.e. in documents and in text form.
Technologies that enable businesses to marry this data with structured content present an enormous
opportunity for improving business insight. Take for example, information stored in insurance claim
systems. Much valuable information is captured in text form. The ability to parse text information and
combine the extracted information with structured data in the claims database will not only enable a firm
to provide a better customer experience, it may also enhance their fraud detection capabilities. These and
other data management related challenges and opportunities have been succinctly captured and classified
by others under the ‘Four Vs’ of data – Volume, Velocity, Variety and Value.
The visionary financial services institution needs to deliver business insights in context, on demand, and at
the point of interaction by analyzing every bit of data available. Big Data technologies comprise the set of
technologies that enable those institutions to deliver to that vision. To a large extent, these technologies
are made feasible by the rising capabilities of commodity hardware, the vast improvements in storage
technologies, and the corresponding fall in the price of computing resources.
The ABN AMRO experience (November 2011):
Banks are traditionally considered the most advanced in data management. Highly transactional and
digitally advanced, in fact some banks are difficult to distinguish from IT firms. They invest heavily in data
infrastructure, as well as in the skills needed to analyse and interpret digital information. “Analysing
financial data is the starting point of any financial institution,” said Paul Scholten, chief operating officer
(COO) of ABN AMRO’s retail and private banking business. ABN AMRO has clean, complete financial data
on both its customers and their internal operations. They capture nearly everything (for regulatory
purposes), but only use the most valuable data for insight although they actively seek out new sources of
data. However, there are challenges and Mr Scholten points to three obstacles that businesses across the
financial services sector are facing:
1. Privacy - “We have the data and tools that can help our customers understand their spending
habits at a deep level,” he says. “We can help them analyse their investment strategies,
understand their tax situation better and save money. But we run into privacy issues with these
things, and we have to be careful about what belongs to us, what belongs to customers and what
belongs to the government.”
2. Unstructured data (see Glossary) - “We are used to structured, financial data,” he says. “We are
not so good at the unstructured stuff.” He says the company is just beginning to understand the
uses of social media, and what might be possible in terms of improving customer service.
3. Combining data across functions to yield new insights – Although ABN AMRO has an advanced risk
analysis department; it does not cross-reference this data with marketing and regulatory or
customer data sets. “We are working on that,” he says. “There is value to be had there.” In
particular, Mr Scholten says that cross-referencing client complaints with operational risk might
yield deeper insight into how operational problems affect customer service.
The Avelo Big Data project
Avelo is unique in that it sits at the heart of the Financial Services sector in the UK and is the conduit
through which most Banks, Insurance Companies and Intermediaries conduct their business. As a
consequence of its position Avelo is perfectly placed to use its domain knowldege to analyse and interpret
digital information but also to leverage its deep understanding of technology to assemble the best
infrastructure for any Big Data project.
The two dimensional matrix below provides a convenient starting, albeit incomplete, framework for
decomposing the high-level technology requirements for managing Big Data. The vertical axis shows the
degree to which data is structured: Data can be unstructured, semi-structured or structured (See
Glossary). Whereas the horizontal axis shows the lifecycle of data: Data is first acquired and stored, then
organized and finally analyzed for business insight.
Source: Oracle white paper, June 2012, Financial Services Data Management: Big Data Technology in
Financial Services.
The diagram suggests that a myriad of disparate technologies are needed to comprehensively handle Big
Data requirements for the enterprise however, these are not ‘either/or’ technologies. They are to be
viewed as parts of a data management continuum: each technology enjoys a set of distinct advantages
depending on the phase in the lifecycle of data management and on the degree of structure within the
data it needs to handle and so these technologies work together within the scope of an enterprise
architecture.
Avelo has researched the market and developed partnerships with companies that are best able to deliver
Big Data projects across both the horizontal and the vertical axis. Those companies chose to work with
Avelo because of Avelo’s domain knowledge. Whereas many industry participants (Banks and Insurance
Companies) may have deeper knowledge of segments of the market they lack a holistic view which is
essential in the analysis and interpretation of Big Data. For example, practices in one part of the industry
can be used to improve and inform practices in another part; without a holistic view this education would
not be possible and it is this sort of activity that makes the most of Big Data advances.
The following are some examples of the Big Data related projects that Avelo is working on:
o Structured data – Avelo is looking at the client data that passes through discrete sets of its
software (e.g. the Avelo Exchange) to see if they can gather, organize and build advanced analytics
that help their customers run their businesses more efficiently be it form a pricing, distribution or
product development perspective.
o Semi-structured data – Avelo is looking at the data that passes through all their products to see if
they have a more holistic view of consumer behaviour within financial services. They believe they
can blend data sets to give more depth to customer insight but also maintain a higher freshness
value (see Glossary). They know this data will be valuable to their clients, giving almost real time
insights into customer behaviour.
o Unstructured data – Avelo is considering working with third parties (e-aggregators and affinity
groups) to see if they can blend their structured and semi-structured data. They will also
investigate whether consumers would want to upload their social media/ unstructured data into
Avelo’s environment so that they can provide timely pertinent insights into each consumer’s
financial status and planning.
Avelo want to be seen helping the consumer of financial services make better more informed
decisions. Financial services can feel overwhelming to consumers as the concepts are often
abstract and in some cases alien to any other experience in their life. As a consequence decisions
are, in many cases, not given the full attention they deserve. Avelo believe they can help
consumers by building predictive analytics that sort through the masses of financial data, making it
easier to make decisions by delivering only the most relevant information which consumers can use
to make comparisons, inform decisions and ultimately fulfil.
Constructing a Big Data capability at Avelo
Avelo defines a Big Data capability as the roles, technologies, processes, and culture needed to support Big
Data initiatives. Perhaps the most critical of these are the roles, and in particular, the expertise and
experience needed to devise and implement Big Data strategies. Multiple roles are needed:
• statisticians who are skilled in the latest statistical techniques;
• analysts and decision scientists who understand business measurement and experimentation and
who can be the broker between statisticians and business managers;
• the IT group who provides guidance on selecting big data technologies/ techniques and who
integrate business intelligence tools with transactional systems such as CRM and Web analytical
tools; and
• business managers and knowledge workers who own the business process and have to be
comfortable with the new “language” of Big Data and social analytics.
Building the algorithms
At Avelo this is an iterative process, they have domain expertise which they use to build the algorithms but
these algorithms have to be tested/ exercised regularly to ensure they remain fit for purpose. There have
been instances where Avelo has devised an algorithm that has a high probability of identifying the target
group but using a data set that is common to only a small proportion of the population. To extract value
from the exercise they had to reduce the number of variables in the algorithm so that it applied to a larger
population but by doing so they lost some accuracy. Over time, through a number of derivations and by
finding more variables and data to mine, they enhanced the algorithm. In many instances they end up
with algorithms that are vastly different from the ones they started with but in all cases the latest
incarnations are the strongest, they have evolved through their iterative process all the while benefitting
from their domain knowledge and deep data expertise. These algorithms are what set Avelo apart from
any other provider of Big Data analytics within Financial Services.
Overcoming the legal hurdles
Companies need to keep up with policies and guidelines surrounding the use of Big Data, especially with
non-transactional, social data that is often created and accessed outside company walls. Big Data policies
should address issues regarding compliance, privacy, and security. Leading organizations must
communicate and be honest in telling customers and consumers how they are using personal data, such as
demographic information and past purchases. A rule of thumb that organizations should follow is to
always think about the customer/ employee experience and their personal benefits from big data projects.
Big Data projects that create a negative experience with users, despite the company benefits, should be
redesigned.
What is Avelo’s protection gap Big Data offering?
The Protection Gap in the UK is increasing. Swiss Re's Term and Health Watch 2012 has found that the Life
Assurance Protection Gap (the gap between what consumers should have for adequate cover and what
they actually have) for the UK has increased by 20% from £2.0 trillion to £2.4 trillion over the past ten
years. On average, this gap amounts to around £100,000 per person, with the amount of under-insurance
greatest among single parents, couples with children and those aged 35 and under. Similarly the Income
Protection Gap has increased by 46% over the past decade from £130 billion annual benefit to £190 billion
annual benefit between 2002 and 2012.
Swiss Re's UK CEO, Russell Higginbotham, says: "Under-insurance in both the life insurance and income
protection areas is proving to be a long-term problem in the UK. The industry is faced with the challenge
of better communicating to consumers how to alleviate the financial burden placed on families and
dependents in difficult times. Simple life assurance cover is not expensive; for example, a healthy 35-year-
old male non-smoker would only pay around £2 per week for £100,000 of life cover to age 65. "
The protection gap theme was also taken up by Swiss Re's Chief Executive Officer of Reinsurance, Christian
Mumenthaler, who looked at how life insurers can address the sales and distribution challenges associated
with new business.
"As our new study shows, protection cover is not top of consumers' minds when they think about
providing for their dependents' financial needs or their own ill health or disability. One of the major
barriers is that consumers think the sales process is too complicated," he said.
He used statistics drawn from Swiss Re's Magnum system to show how potential life insurance
policyholders can be put off buying life cover when subjected to a lengthy underwriting process.
Comparing the conversion rates of sales made by intermediaries, the success rate for cases accepted at
point of sale at standard rates is around 90%. This falls to about 70% when referred to an underwriter –
consumers can easily lose interest.
Mumenthaler explained that by harnessing the right data – such as health club memberships, buying
patterns at the local supermarket and so on – insurers could automatically "pre-select" consumers who
appear to represent a low risk, and who could be put on risk immediately at the point of sale without the
burden of underwriting.
Avelo, with it’s partners, is developing a service that targets individuals with a protection gap. Using their
domain experience, and information gleaned from their products, Avelo are building algorithms that not
only identify groups of consumers with a protection gap but also assesses whether they are simple or
complex cases from an underwriting perspective. They are also able to benchmark those groups with
other consumers that were in a similar situation but took action to close the gap. Avelo believe that this
data in a direct to consumer channel offering, calibrated with the latest thinking in behavioural science and
supported by a co browsing or telephone advice experience is unique in the market and could be part of
the solution to the Protection challenge in the UK.
What other Big Data offerings could Avelo explore?
Aviva, in a September 2010 report, stated that the UK - and indeed the whole of Europe - is facing a
retirement funding shortfall. The report was produced to quantify the pensions gap, to open up the
debate and to find solutions. The pensions gap refers to the difference between the income needed to live
comfortably in retirement, and the actual income individuals can currently expect.
The study, carried out by Aviva in conjunction with Deloitte, showed that the annual European pensions
gap for individuals retiring over the next 40 years is £1.6 trillion. The UK’s pensions gap stood at £318
billion and equated to an average of £10,300 per UK adult annually - the largest per person total across all
European countries studied.
Aviva believe that by engaging and empowering people with:
1. annual statements that show a forecast of pension retirement income from all sources, in one
place; and
2. interactive tools (like a pension calculator) demonstrating how saving more earlier in life can
improve their retirement income.
Would help tackle the savings gap – giving people a greater understanding of what they are likely to
receive, thereby encouraging people to save during their working lives for a better income in retirement.
In the same way that Avelo is developing a service to tackle the Protection Gap in the UK, Avelo could also
use a similar process to tackle the Pension Gap. Again, using their domain experience, and information
gleaned from their products, Avelo could build algorithms that identify groups of consumers with a
pension gap. These groups could be benchmarked against other groups of consumers that were in a
similar situation but took action to close the gap. The benefits of closing the gap could be quantified in
real life stories and used to persuade others to take action. Again, Avelo believe that this data in a direct
to consumer channel offering that calculates forecasts of pension retirement income from all sources and
is supported by a co browsing or telephone advice experience would also be unique in the market and
could be part of the solution to the Pensions challenge in the UK.
Summary
With the cost of data capture and acquisition decreasing at a rapid rate, the real value of Big Data is in its
use. Companies that effectively create and implement Big Data strategies, such as those described in this
article, stand to gain a competitive advantage. Avelo has made a business decision that it wants to explore
the Big Data opportunities within Financial Services, build a reputation for excellence in that field and help
its clients develop a competitive edge.
We hope you find this white paper informative and should you require any further information please do
not hesitate to contact Avelo.
References
FOUR STRATEGIES TO CAPTURE AND CREATE VALUE FROM BIG DATA by Salvatore Parise, Bala Iyer, and
Dan Vesset July / August 2012 (www.iveybusinessjournal.com/topics/strategy/four-strategies-to-capture-
and-create-value-from-big-data).
Vesset, D., Morris, H.D., Little, G., Borovick, L., Feldman, S., Eastwood, M., Woo, B., Villars, R.L., Bozman,
J.S., Olofson, C.W., Conway, S., & Yezhkova, N. IDC market analysis: Worldwide big data technology and
services, IDC #233485, March 2012.
Duhigg, C. How companies learn your secrets, The New York Times, 2/16/2012.
Hoffman, D.L., & Fodor, M. (2010). Can you measure the ROI of your social media marketing?, MIT Sloan
Management Review, 52(1), 41-49.
Schaefer, M.W. (2012). Return on influence, McGraw-Hill, pp 5-6.
DC Customer Spotlight: Whirlpool corporation’s digital detectives: Attensity provides the lens, March 2011.
www.sas.com/knowledge-exchange/risk/integrated-risk/big-data-a-big-bummer-for-financial-services
www.ibm.com/software/data/bigdata/
Oracle white paper (June 2012, Financial Services Data Management: Big Data Technology in Financial
Services).
Swiss Re, Term & Health Watch 2013.
Aviva, Mind the Gap, quantifying the pension gap in the UK, September 2010.
Glossary
Some of the following terms have been used in this document whereas others you will come across when
discussing ‘Big Data’ with industry participants.
• Apache Software Foundation (www.apache.org) - provides support for the Apache community of
open-source software projects (this includes Hadoop and Subversion – see below), which provide
software products for the public good.
• Cluster - cluster analysis or clustering is the task of grouping a set of objects in such a way that
objects in the same group (a cluster) are more similar (in some sense or another) to each other than
to those in other groups (clusters). It is a main task of exploratory data mining, and a common
technique for statistical data analysis used in many fields.
• Cluster analysis - in itself is not one specific algorithm, but the general task to be solved. It can be
achieved by various algorithms that differ significantly in their notion of what constitutes a cluster
and how to efficiently find them. Clustering can be formulated as a multi-objective optimization
problem. Cluster analysis as such is not an automatic task, but an iterative process of knowledge
discovery or interactive multi-objective optimization that involves trial and failure. It will often be
necessary to modify data pre-processing and model parameters until the result achieves the desired
properties.
• Data Artesian – people that understand data analysis and are business savvy.
• Data Scientist – people who incorporate varying elements and build on techniques and theories
from many fields, including maths, statistics, data engineering, pattern recognition and learning,
advanced computing, visualization, uncertainty modelling, data warehousing, and high performance
computing with the goal of extracting meaning from data and creating data products.
• DBMS (DW) – Database Management System Data Warehouse.
• DBMS (OLTP) – Database Management System Online Transaction Processing.
• Distributed File Systems – In computing, a distributed file system or network file system is any file
system that allows access to files from multiple hosts.
• Exabyte (EB) – An Exabyte (EB) is a large unit of computer data storage. The prefix ‘Exa’ means one
billion billion or one quintillion. An Exabyte is approximately one quintillion Bytes or a billion
Gigabytes. An Exabyte of storage could contain 50,000 years' worth of DVD-quality video. The
world's technological capacity to store information grew from 2.6 (optimally compressed) Exabytes
in 1986 to 15.8 in 1993 to over 54.5 in 2000. As of 2012, about 2.5 Exabytes of data are created each
day, and that number is doubling every 40 months. 90% of the data in the world today has been
created in the last two years (www.ibm.com/software/data/bigdata/). A Byte (B) is smaller than a
Megabyte (MB), which is small than a Gigabyte (GB), which is smaller than a Terabyte (TB) which is
smaller than a Petabyte (PB), which is smaller than an Exabyte (EB), which is smaller than a Zettabyte
(ZB), that is smaller than a Yottabyte (YB).
• Freshness value – i.e. how fresh is the data and the insight? For triggers and nudges to be valuable
the data must have a high ‘freshness value’.
• Google BigQuery – allows you to run SQL-like queries against very large datasets, with potentially
billions of rows. This can be your own data, or data that someone else has shared with you.
BigQuery works best when analysing very large datasets, typically using a small number of very large,
append-only tables. For more traditional relational database scenarios, you might consider using
Google Cloud SQL instead. You can use BigQuery through a web UI called the BigQuery browser tool,
the bq command-line tool, or by making calls to the REST API using various client libraries in multiple
languages, such as Java, Python, etc.
• Google Dremel - Dremel is a scalable, interactive ad-hoc query system for analysis of read-only
nested data. By combining multi-level execution trees and columnar data layout, it is capable of
running aggregation queries over one trillion-row tables in seconds. The system scales to thousands
of CPUs and Petabytes of data, and has thousands of users at Google.
• Google File System – a proprietary distributed file system developed by Google Inc. for its own use.
• Hadoop (HD) (www.hadoop.apache.org) - is a distributed computing platform written in Java. It
incorporates features similar to those of the ’Google File System’ and of ‘MapReduce’.
a. What Hadoop is not – The developers behind Hadoop see a lot of emails where people hear
about Hadoop and think it will be the silver bullet to solve all their application/ data centre
problems. It is not. It solves some specific problems for some companies and organisations, but
only after they have understood the technology and where it is appropriate. If you start using
Hadoop in the belief it is a drop-in replacement for your database or SAN file system, you will be
disappointed.
b. Hadoop is not a substitute for a database - Databases are wonderful. Issue an SQL SELECT call
against an indexed/ tuned database and the response comes back in milliseconds. Want to
change that data? SQL UPDATE and the change is in. Hadoop does not do this. Hadoop stores
data in files, and does not index them. If you want to find something, you have to run a
‘MapReduce’ job going through all the data. This takes time, and means that you cannot directly
use Hadoop as a substitute for a database. Where Hadoop works is where the data is too big for
a database (i.e. you have reached the technical limits, not just that you don't want to pay for a
database license). With very large datasets, the cost of regenerating indexes is so high you can't
easily index changing data. With many machines trying to write to the database, you can't get
locks on it. Here the idea of vaguely-related files in a distributed file system can work. There is a
high performance column-table database that runs on top of Hadoop HDFS: Apache HBase. This
is a great place to keep the results extracted from your original data.
c. MapReduce (MR) is not always the best algorithm - MapReduce is a profound idea: taking a
simple functional programming operation and applying it, in parallel, to Petabytes of data. But
there is a price. For that parallelism, you need to have each MR operation independent from all
the others. If you need to know everything that has gone before, you have a problem.
Although, such problems can be aided by iteration and shared state information.
• Hana – Similar to Hadoop but run by SAP.
• In-Memory Data Grids (IMDG) - designed to store data in main memory, ensure scalability and store
an object itself. There are many IMDG products, both commercial and open source.
• Key-Value Stores - have records which consist of a pair including a key and a payload.
• MapReduce (MR) – is the key algorithm that the Hadoop (HD) engine uses to distribute work around
a cluster.
• Nested data - In general, something that is nested is fully contained within something else of the
same kind. In programming, nested describes code that performs a particular function that is
contained within code that performs a broader function. One well-known example is the procedure
known as the nested do-loop. In data structures, data organizations that are separately identifiable
but also part of a larger data organization are said to be nested within the larger organization. A
table within a table is a nested table. A list within a list is a nested list.
• Nudges – At Avelo ‘Nudges’ are the medium (e-mail, text, social media interaction) that when
informed by a Trigger or Pre-triggers and used intelligently increase the probability of persuading a
consumer to fulfil through a D2C website.
• Pre-Triggers – At Avelo, these are models that predict a ‘Trigger’(see Triggers below); they provide a
probability score of that ‘Trigger’ happening.
• Structured data – The term structured data refers to data that is identifiable because it is organized
in a structure. The most common form of structured data or structured data records (SDR) is a
database where specific information is stored based on a methodology of columns and rows.
• Subversion (www.subversion.apache.org) – Subversion exists to be universally recognized and
adopted as an open-source, centralized version control system characterized by its reliability as a
safe haven for valuable data; the simplicity of its model and usage; and its ability to support the
needs of a wide variety of users and projects, from individuals to large-scale enterprise operations.
• Triggers – At Avelo ‘Triggers’ are events that spur a consumer into taking action e.g. buying life
insurance. That event could be a life changing moment e.g. birth of child, death of a close relative
etc. (also see Pre-Triggers above):
Triggers
General
Insurance
Mortgages Protection Investments Pensions Annuities
Buy a car 
Death of a family member or close friend  
Divorce 
Got a new job 
Had children  
Just to give peace of mind about the future   
Lost your job and are looking to replace
benefits

Marriage   
Previous financial difficulties 
Protecting my health 
Protecting my retirement income 
Providing for my children  
Purchased a new property   
Saving for my retirement  
To cover inheritance tax  
To cover the unexpected loss of my job 
To get a better return on my money 
To get a guaranteed income in my retirement 
You or a family member had a critical illness  
• Unstructured Data – sensor data, social media outpourings, video and images that do not fit neatly
into the rows and columns of most databases.

More Related Content

What's hot

Governing Big Data : Principles and practices
Governing Big Data : Principles and practicesGoverning Big Data : Principles and practices
Governing Big Data : Principles and practicesPiyush Malik
 
The Power of Data: Understanding Supply Chain Analytics
The Power of Data: Understanding Supply Chain AnalyticsThe Power of Data: Understanding Supply Chain Analytics
The Power of Data: Understanding Supply Chain AnalyticsXeneta
 
Mejorar la toma de decisiones con Big Data
Mejorar la toma de decisiones con Big DataMejorar la toma de decisiones con Big Data
Mejorar la toma de decisiones con Big DataMiguel Ángel Gómez
 
INFOGRAPHIC: Big Data Alchemy
INFOGRAPHIC: Big Data AlchemyINFOGRAPHIC: Big Data Alchemy
INFOGRAPHIC: Big Data AlchemyCapgemini
 
Idiro Analytics - Analytics & Big Data
Idiro Analytics - Analytics & Big DataIdiro Analytics - Analytics & Big Data
Idiro Analytics - Analytics & Big DataIdiro Analytics
 
Analytical thinking 8 - June 2012
Analytical thinking 8 - June 2012Analytical thinking 8 - June 2012
Analytical thinking 8 - June 2012Charlotte Skornik
 
Is Your Company Braced Up for handling Big Data
Is Your Company Braced Up for handling Big DataIs Your Company Braced Up for handling Big Data
Is Your Company Braced Up for handling Big Datahimanshu13jun
 
Big Data Analytics and its Application in E-Commerce
Big Data Analytics and its Application in E-CommerceBig Data Analytics and its Application in E-Commerce
Big Data Analytics and its Application in E-CommerceUyoyo Edosio
 
Big Data Survey/Handbook Summary Charts - 8 JULY 2013
Big Data Survey/Handbook Summary Charts - 8 JULY 2013Big Data Survey/Handbook Summary Charts - 8 JULY 2013
Big Data Survey/Handbook Summary Charts - 8 JULY 2013Lora Cecere
 
IRJET - Big Data: Evolution Cum Revolution
IRJET - Big Data: Evolution Cum RevolutionIRJET - Big Data: Evolution Cum Revolution
IRJET - Big Data: Evolution Cum RevolutionIRJET Journal
 
The Big Returns from Big Data
The Big Returns from Big Data The Big Returns from Big Data
The Big Returns from Big Data EMC
 
Predictive analytics-white-paper
Predictive analytics-white-paperPredictive analytics-white-paper
Predictive analytics-white-paperShubhashish Biswas
 

What's hot (14)

Governing Big Data : Principles and practices
Governing Big Data : Principles and practicesGoverning Big Data : Principles and practices
Governing Big Data : Principles and practices
 
The Power of Data: Understanding Supply Chain Analytics
The Power of Data: Understanding Supply Chain AnalyticsThe Power of Data: Understanding Supply Chain Analytics
The Power of Data: Understanding Supply Chain Analytics
 
Risk mgmt-analysis-wp-326822
Risk mgmt-analysis-wp-326822Risk mgmt-analysis-wp-326822
Risk mgmt-analysis-wp-326822
 
Mejorar la toma de decisiones con Big Data
Mejorar la toma de decisiones con Big DataMejorar la toma de decisiones con Big Data
Mejorar la toma de decisiones con Big Data
 
INFOGRAPHIC: Big Data Alchemy
INFOGRAPHIC: Big Data AlchemyINFOGRAPHIC: Big Data Alchemy
INFOGRAPHIC: Big Data Alchemy
 
Idiro Analytics - Analytics & Big Data
Idiro Analytics - Analytics & Big DataIdiro Analytics - Analytics & Big Data
Idiro Analytics - Analytics & Big Data
 
Analytical thinking 8 - June 2012
Analytical thinking 8 - June 2012Analytical thinking 8 - June 2012
Analytical thinking 8 - June 2012
 
Is Your Company Braced Up for handling Big Data
Is Your Company Braced Up for handling Big DataIs Your Company Braced Up for handling Big Data
Is Your Company Braced Up for handling Big Data
 
Big Data Analytics and its Application in E-Commerce
Big Data Analytics and its Application in E-CommerceBig Data Analytics and its Application in E-Commerce
Big Data Analytics and its Application in E-Commerce
 
Big Data Survey/Handbook Summary Charts - 8 JULY 2013
Big Data Survey/Handbook Summary Charts - 8 JULY 2013Big Data Survey/Handbook Summary Charts - 8 JULY 2013
Big Data Survey/Handbook Summary Charts - 8 JULY 2013
 
IRJET - Big Data: Evolution Cum Revolution
IRJET - Big Data: Evolution Cum RevolutionIRJET - Big Data: Evolution Cum Revolution
IRJET - Big Data: Evolution Cum Revolution
 
The Big Returns from Big Data
The Big Returns from Big Data The Big Returns from Big Data
The Big Returns from Big Data
 
Rulex big data and analytics
Rulex big data and analyticsRulex big data and analytics
Rulex big data and analytics
 
Predictive analytics-white-paper
Predictive analytics-white-paperPredictive analytics-white-paper
Predictive analytics-white-paper
 

Viewers also liked

Payroll Based Journal Reporting 101: What You Need to Know to Be Ready for PBJ!
Payroll Based Journal Reporting 101: What You Need to Know to Be Ready for PBJ!Payroll Based Journal Reporting 101: What You Need to Know to Be Ready for PBJ!
Payroll Based Journal Reporting 101: What You Need to Know to Be Ready for PBJ!OnShift
 
Link de video en movie maker en youtube
Link de video en movie maker en youtubeLink de video en movie maker en youtube
Link de video en movie maker en youtubekasandra martinez
 
Aqua fun Hotel Time Share/ Residential Resort Property
Aqua fun Hotel Time Share/ Residential Resort  PropertyAqua fun Hotel Time Share/ Residential Resort  Property
Aqua fun Hotel Time Share/ Residential Resort Propertylily ong G.
 
HELP to HELP 2015 Presentation Booklet
HELP to HELP 2015 Presentation BookletHELP to HELP 2015 Presentation Booklet
HELP to HELP 2015 Presentation BookletAlma Cicchelli
 
Diagnostico.por.la.lengua
Diagnostico.por.la.lenguaDiagnostico.por.la.lengua
Diagnostico.por.la.lenguasai gomez
 
Vodafone IoT_Prompt Softech case study
Vodafone IoT_Prompt Softech case studyVodafone IoT_Prompt Softech case study
Vodafone IoT_Prompt Softech case studyAshim Goldar
 
Ortodoncia 1 clase anomalias mod
Ortodoncia 1 clase anomalias modOrtodoncia 1 clase anomalias mod
Ortodoncia 1 clase anomalias mod6224
 
Fibra Alimentaria
Fibra AlimentariaFibra Alimentaria
Fibra AlimentariaJulio Mata
 
Sindrome de Cushing
Sindrome de CushingSindrome de Cushing
Sindrome de CushingDANTX
 

Viewers also liked (12)

Payroll Based Journal Reporting 101: What You Need to Know to Be Ready for PBJ!
Payroll Based Journal Reporting 101: What You Need to Know to Be Ready for PBJ!Payroll Based Journal Reporting 101: What You Need to Know to Be Ready for PBJ!
Payroll Based Journal Reporting 101: What You Need to Know to Be Ready for PBJ!
 
Link de video en movie maker en youtube
Link de video en movie maker en youtubeLink de video en movie maker en youtube
Link de video en movie maker en youtube
 
Apuntes
ApuntesApuntes
Apuntes
 
KuwaitBuildBrochure
KuwaitBuildBrochureKuwaitBuildBrochure
KuwaitBuildBrochure
 
Aqua fun Hotel Time Share/ Residential Resort Property
Aqua fun Hotel Time Share/ Residential Resort  PropertyAqua fun Hotel Time Share/ Residential Resort  Property
Aqua fun Hotel Time Share/ Residential Resort Property
 
HELP to HELP 2015 Presentation Booklet
HELP to HELP 2015 Presentation BookletHELP to HELP 2015 Presentation Booklet
HELP to HELP 2015 Presentation Booklet
 
Diagnostico.por.la.lengua
Diagnostico.por.la.lenguaDiagnostico.por.la.lengua
Diagnostico.por.la.lengua
 
Le subjonctif
Le subjonctifLe subjonctif
Le subjonctif
 
Vodafone IoT_Prompt Softech case study
Vodafone IoT_Prompt Softech case studyVodafone IoT_Prompt Softech case study
Vodafone IoT_Prompt Softech case study
 
Ortodoncia 1 clase anomalias mod
Ortodoncia 1 clase anomalias modOrtodoncia 1 clase anomalias mod
Ortodoncia 1 clase anomalias mod
 
Fibra Alimentaria
Fibra AlimentariaFibra Alimentaria
Fibra Alimentaria
 
Sindrome de Cushing
Sindrome de CushingSindrome de Cushing
Sindrome de Cushing
 

Similar to Avelo_BigData_Whitepaper

Drive your business with predictive analytics
Drive your business with predictive analyticsDrive your business with predictive analytics
Drive your business with predictive analyticsThe Marketing Distillery
 
Rocket Fuel Big Data Report
Rocket Fuel Big Data ReportRocket Fuel Big Data Report
Rocket Fuel Big Data ReportCarat Turkiye
 
Converting Big Data To Smart Data | The Step-By-Step Guide!
Converting Big Data To Smart Data | The Step-By-Step Guide!Converting Big Data To Smart Data | The Step-By-Step Guide!
Converting Big Data To Smart Data | The Step-By-Step Guide!Kavika Roy
 
Big Data Report - 16 JULY 2012
Big Data Report - 16 JULY 2012Big Data Report - 16 JULY 2012
Big Data Report - 16 JULY 2012Lora Cecere
 
Data Driven Marketing: the DNA of customer orientated companies
Data Driven Marketing: the DNA of customer orientated companiesData Driven Marketing: the DNA of customer orientated companies
Data Driven Marketing: the DNA of customer orientated companiesGood Rebels
 
The big data strategy using social media
The big data strategy using social mediaThe big data strategy using social media
The big data strategy using social mediaVaibhav Thombre
 
Big Data Analytics: Challenges And Applications For Text, Audio, Video, And S...
Big Data Analytics: Challenges And Applications For Text, Audio, Video, And S...Big Data Analytics: Challenges And Applications For Text, Audio, Video, And S...
Big Data Analytics: Challenges And Applications For Text, Audio, Video, And S...IJSCAI Journal
 
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...gerogepatton
 
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...gerogepatton
 
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...gerogepatton
 
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...ijscai
 
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...gerogepatton
 
Barry Ooi; Big Data lookb4YouLeap
Barry Ooi; Big Data lookb4YouLeapBarry Ooi; Big Data lookb4YouLeap
Barry Ooi; Big Data lookb4YouLeapBarry Ooi
 
Module 4 - Data as a Business Model - Online
Module 4 - Data as a Business Model - OnlineModule 4 - Data as a Business Model - Online
Module 4 - Data as a Business Model - Onlinecaniceconsulting
 
Thebigdatastrategyusingsocialmedia 140126142538-phpapp01
Thebigdatastrategyusingsocialmedia 140126142538-phpapp01Thebigdatastrategyusingsocialmedia 140126142538-phpapp01
Thebigdatastrategyusingsocialmedia 140126142538-phpapp01聪 徐
 
The Augmented Analytics Reset In Retail
The Augmented Analytics Reset In RetailThe Augmented Analytics Reset In Retail
The Augmented Analytics Reset In RetailBernard Marr
 

Similar to Avelo_BigData_Whitepaper (20)

Drive your business with predictive analytics
Drive your business with predictive analyticsDrive your business with predictive analytics
Drive your business with predictive analytics
 
Rocket Fuel Big Data Report
Rocket Fuel Big Data ReportRocket Fuel Big Data Report
Rocket Fuel Big Data Report
 
Redefining Define Analysis
Redefining Define AnalysisRedefining Define Analysis
Redefining Define Analysis
 
Big data is a popular term used to describe the exponential growth and availa...
Big data is a popular term used to describe the exponential growth and availa...Big data is a popular term used to describe the exponential growth and availa...
Big data is a popular term used to describe the exponential growth and availa...
 
Dat analytics all verticals
Dat analytics all verticalsDat analytics all verticals
Dat analytics all verticals
 
Converting Big Data To Smart Data | The Step-By-Step Guide!
Converting Big Data To Smart Data | The Step-By-Step Guide!Converting Big Data To Smart Data | The Step-By-Step Guide!
Converting Big Data To Smart Data | The Step-By-Step Guide!
 
Big Data Report - 16 JULY 2012
Big Data Report - 16 JULY 2012Big Data Report - 16 JULY 2012
Big Data Report - 16 JULY 2012
 
Data Driven Marketing: the DNA of customer orientated companies
Data Driven Marketing: the DNA of customer orientated companiesData Driven Marketing: the DNA of customer orientated companies
Data Driven Marketing: the DNA of customer orientated companies
 
The big data strategy using social media
The big data strategy using social mediaThe big data strategy using social media
The big data strategy using social media
 
Big Data Analytics: Challenges And Applications For Text, Audio, Video, And S...
Big Data Analytics: Challenges And Applications For Text, Audio, Video, And S...Big Data Analytics: Challenges And Applications For Text, Audio, Video, And S...
Big Data Analytics: Challenges And Applications For Text, Audio, Video, And S...
 
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
 
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
 
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
 
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
 
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
 
Barry Ooi; Big Data lookb4YouLeap
Barry Ooi; Big Data lookb4YouLeapBarry Ooi; Big Data lookb4YouLeap
Barry Ooi; Big Data lookb4YouLeap
 
Module 4 - Data as a Business Model - Online
Module 4 - Data as a Business Model - OnlineModule 4 - Data as a Business Model - Online
Module 4 - Data as a Business Model - Online
 
Thebigdatastrategyusingsocialmedia 140126142538-phpapp01
Thebigdatastrategyusingsocialmedia 140126142538-phpapp01Thebigdatastrategyusingsocialmedia 140126142538-phpapp01
Thebigdatastrategyusingsocialmedia 140126142538-phpapp01
 
Sas business analytics
Sas   business analyticsSas   business analytics
Sas business analytics
 
The Augmented Analytics Reset In Retail
The Augmented Analytics Reset In RetailThe Augmented Analytics Reset In Retail
The Augmented Analytics Reset In Retail
 

Avelo_BigData_Whitepaper

  • 1. Big Data opporunities for Banks, Providers & Intermediaries and the role of their Software Vendor by Mark Pearce1 1 Avelo, Cheltenham, England, UK
  • 2. Table of Contents Abstract..............................................................................................................................................................3 Who is Avelo?....................................................................................................................................................3 What is Big Data?..............................................................................................................................................4 Business Benefits of Big Data...........................................................................................................................4 Big Data within Financial Services....................................................................................................................7 The Avelo Big Data project...............................................................................................................................8 Constructing a Big Data capability at Avelo...................................................................................................10 What is Avelo’s protection gap Big Data offering?.........................................................................................11 What other Big Data offerings could Avelo explore?.....................................................................................12 Summary..........................................................................................................................................................13 References........................................................................................................................................................13 Glossary...........................................................................................................................................................14
  • 3. Abstract Avelo is a UK financial software vendor that has an active program underway to integrate its applications; at the same time it’s also researching how it can leverage Big Data. The financial services market is shifting from an emphasis on process to an emphasis on data. Customers are no longer willing to accept silo applications and instead expect applications that plug and play together with integrated data to satisfy user needs. Avelo is focused on unifying data, frameworks, functionality and presentation to create a platform built on common patterns and practices. This paper examines how Avelo could implement Big Data projects to unlock additional value for its clients. We believe that many of Avelo’s experiences will be informative to readers in other companies. Who is Avelo? Avelo is a UK financial software vendor that is a fusion of formerly separate but highly successful niche companies. As a consequence of this heritage, Avelo products have varied data representations, a mix of approaches to functionality and different presentations. The products cover a wide spectrum of usage, serving individual consumers, intermediaries, small financial institutions, networks of financial advisors, large insurers and large banks. Most of Avelo’s software is high-value operational software that performs day-to-day business processing. However, Avelo is also building and investigating analytical services that mine the data obtained from its products to provide insights into the behaviour of its customers’ customers. Avelo’s current business is focused on the UK, but the company is looking to expand its presence into continental Europe, the US, and beyond. Avelo has applications that are leaders in the UK market, such as the following: o Adviser Office. This software is used by more than a thousand adviser firms offering wealth management and financial advice, both multi-tiered and independent. Adviser Office integrates and links with over 70 partners, including product providers, portals and fund supermarkets to aggregate client data and avoid data re-keying. o Exchange Portal. The Exchange Portal is the largest provider of online life and pension quotations in the UK. The Portal has over 30,000 registered users. It provides online information and transaction services, with over a third of a billion client quotations processed between March 2008 and March 2011. o Avelo Trigold. Around two thirds of the UK’s mortgage advisors use the software to research and apply for mortgages on their customers’ behalf. o Mortgage Sales & Originations. It is estimated that one in four UK mortgages currently touch Avelo’s point-of-sale and originations systems.
  • 4. Even though Avelo’s applications are thriving as individual successes, the applications are separate and do not interoperate beyond customer-specific integrations. Avelo is now developing a common enterprise data model so that it can align its applications and support them with a single strategic platform. What is Big Data? Big Data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. The challenges include cost, capture, storage, search, sharing, transfer, analysis, visualization and return-on-investment. As of 2012, limits on the size of data sets that are feasible to process in a reasonable amount of time were in the order of Exabytes (see Glossary) of data. Data sets are growing in size because they are being gathered by ubiquitous information hungry mobile devices, global social media sites, increasingly sophisticated consumer facing company websites and swathes of Business to Business (B2B) and Business to Consumer (B2C) Business Intelligence (BI) processes. Big data is difficult to work with using most relational database management systems and desktop statistical software and visualization packages as it requires huge amounts of parallel software running on tens, hundreds, or even thousands of servers. However, Big Data is more than simply a matter of size; it is an opportunity to find insights in new and emerging types of data and content, to make your business more agile and to answer questions that were previously considered beyond your reach. Until now, there was no practical way to harvest this opportunity. Business Benefits of Big Data There are compelling business reasons for developing Big Data analytical capabilities: o Performance Management Performance management involves understanding the meaning of data in company databases using pre-determined queries and multidimensional analysis. The data used for this analysis are transactional, for example, years of customer purchasing activity or inventory levels and turnover. Managers can ask questions such as which are the most profitable customer segments and get answers in real-time that can be used to help make short-term business decisions and longer term plans. The main challenge is to ensure the quality and completeness of transactions entered into the system or the result will be “garbage in, garbage out.” Also, to guarantee a complete picture of the business, multiple databases across functions have to be integrated. o Data Exploration Data exploration makes heavy use of statistics to experiment and get answers to questions that managers might not have thought of previously. This approach leverages predictive modelling techniques to predict user behaviour based on their previous business transactions and preferences. Cluster analysis (see Glossary) can be used to segment customers into groups based on similar attributes. Once these groups are discovered, managers can perform targeted actions
  • 5. such as customizing marketing messages, upgrading services and cross/ up-selling to each unique group. Another popular use case is to predict what group of users may “drop out.” Armed with this information, managers can proactively devise strategies to retain this user segment and lower the churn rate. With an increased emphasis on digital, inbound marketing, organizations want to attract prospects to their website with engaging, robust, and targeted content. By running experiments on these groups managers can predict which combination of variables will lead to the highest conversion rate of site visitors to qualified leads and qualified leads to customers. Target, the large US retailer, used data mining techniques to predict the buying habits of clusters of customers that were going through a major life event. Predicting customers who are going through big life changes such as pregnancy, marriage, and divorce, is important to retailers since these customers are most likely to be flexible and change their buying habits, making them ideal targets for advertisers. Target was able to identify roughly 25 products, such as unscented lotion and vitamin supplements, that when analysed together, helped determine a “pregnancy prediction” score. Target then sent promotions focused on baby-related products to women based on their pregnancy prediction score. The result: sales of Target’s Mum and Baby products sharply increased soon after the launch of their new advertising campaigns. Target had to adjust how it communicated this promotion to women who were most likely pregnant, once it had learned that the initial advertising had made some of them upset. As a result, Target made sure to include advertisements that were not baby-related so the baby ads would look random. o Social Analytics Social analytics measure the vast amount of non-transactional data that exists today. Much of this data exist on social media platforms, such as conversations and reviews on Facebook, Twitter, and Google+. Social analytics measure three broad categories: awareness, engagement, and word-of- mouth or reach. Awareness looks at the exposure or mentions of social content and often involves metrics such as the number of video views and the number of followers or community members. Engagement measures the level of activity and interaction among platform members, such as the frequency of user-generated content. More recently, mobile applications and platforms such as Foursquare (www.foursquare.com) provide organizations with location-based data that can measure brand awareness and engagement, including the number and frequency of check-ins, with active users rewarded with badges. Reach measures the extent to which content is disseminated to other users across social platforms. Reach can be measured with variables such as the number of re-tweets on Twitter and shared likes on Facebook. Social metrics are critical since they help inform managers of the success of their external and internal social digital campaigns and activities. For example, marketing campaigns involving contests and promotions on Facebook can be assessed through the number of consumer ideas submitted and the community comments related to those ideas. If the metrics indicate poor results, managers can pivot and make changes.
  • 6. With recent advancements in social measurement techniques, we can now calculate one’s “digital footprint” in the social media world. Companies like PeerIndex (www.peerindex.xom) and Klout (www.klout.com) can measure a digital user’s social influence. A Klout score ranges from 1 to 100, based on their algorithm involving number of followers, re-tweets, the influence of the followers themselves and other variables. Marketers are using social metrics to identify “influencers,” those well-followed individuals who are discussing their particular brand and can serve as a brand advocate. Using Klout’s services, Virgin America identified 120 individuals with high Klout scores and offered them a free flight to promote their new Toronto route. These individuals were under no obligation to write about their experience but between these 120 individuals and another 144 engaged influencers, the campaign resulted in a total of 4,600 tweets, 7.4 million impressions and coverage in top news outlets. Thus, the campaign created a high brand awareness of the new airline route. o Decision Science Decision science involves experiments and analysis of non-transactional data, such as consumer- generated product ideas and product reviews, to improve the decision-making process. Unlike social analysers who focus on social analytics to measure known objectives, decision scientists explore social big data as a way to conduct “field research” and to test hypotheses. Crowdsourcing, including idea generation and polling, enables companies to pose questions to the community about its products and brands. Decision scientists, in conjunction with community feedback, determine the value, validity, feasibility and fit of these ideas and eventually report on if/ how they plan to put these ideas in to action. The My Starbucks Idea program enables consumers to share, vote, and submit ideas regarding Starbuck’s products, customer experience, and community involvement. Over 100,000 ideas have been collected to date. Starbucks has an “Ideas in Action” section to discuss where ideas sit in the review process. Many of the techniques used by decision scientists involve listening tools that perform text and sentiment analysis. By leveraging these tools, companies can measure specific topics of interest around its products, as well as who is saying what about these topics. For example, before a new product is launched, marketers can measure how consumers feel about price, the impact that demographics may have on sentiment, and how price sentiment changes over time. Managers can then adjust prices based on these tests. Whirlpool, the manufacturer of home appliances, in 2009 wanted to discover what their customers and consumers were saying about their products and services on social media platforms. They used Attensity360 (www.attensity.com) for continuous monitoring and analysis of conversations across popular channels such as Facebook, Twitter, and Youtube, review and blogger sites and mainstream news. Attensity’s text analytics findings were incorporated into Whirlpool’s decision models to accurately predict customer churn, loyalty, and satisfaction. This process enabled the company to listen, respond, and measure on a scale unobtainable by manual methods. The results revealed that Whirlpool improved its understanding of its overall business. There was increased satisfaction, faster responsiveness, and overall, more satisfied experiences with customers. The company also incorporated customer feedback to improve its product development and planning
  • 7. process. While technology has helped companies scale the listening process involving social Big Data, the accuracy of listening tools is nowhere near perfect. Manual work is needed to “train” these technologies on company and industry specific keywords with regard to textual and sentiment analysis. With respect to future trends in the Big Data field, the following practice is starting to emerge: o Integrating multiple big data strategies. While a company can be effective with a single Big Data strategy, the most effective companies leveraging Big Data today are combining strategies. For example, one financial institution is leveraging both Social Analytics (non-transactional social data) and Performance Management (business intelligence using transactional data) strategies to guide its customer service. The bank traditionally determined its “top” customers based on metrics such as number and balance of accounts; these were the customers who received premium service. Now, the bank is planning to incorporate social metrics into the equation. Those online customers who are very active with respect to mentioning, engaging with, and promoting the bank on social channels will also be considered for high-level service programs. The financial institution believes this is a much more balanced way to segment its most influential customers for customer service. Big Data within Financial Services The Financial Services Industry is amongst the most data driven of industries. The regulatory environment that commercial banks and insurance companies operate within requires these institutions to store and analyse many years of transaction data. For the most part, financial services firms have relied on relational technologies coupled with business intelligence tools to handle this ever-increasing data and analytics burden. It is however increasingly clear that while such technologies will continue to play an integral role, new technologies –many of them developed in response to the data analytics challenges first faced in e- commerce, internet search and other industries – have a transformative role in enterprise data management. The challenge, as outlined in ‘What is Big Data’ above, is not only one of sheer data volumes but also of data variety and the timeliness with which such varied data needs to be aggregated and analysed. As data driven as financial services companies are, analysts estimate that somewhere between 80 and 90 percent of the data that Financial Services firms have is unstructured, i.e. in documents and in text form. Technologies that enable businesses to marry this data with structured content present an enormous opportunity for improving business insight. Take for example, information stored in insurance claim systems. Much valuable information is captured in text form. The ability to parse text information and combine the extracted information with structured data in the claims database will not only enable a firm to provide a better customer experience, it may also enhance their fraud detection capabilities. These and other data management related challenges and opportunities have been succinctly captured and classified by others under the ‘Four Vs’ of data – Volume, Velocity, Variety and Value.
  • 8. The visionary financial services institution needs to deliver business insights in context, on demand, and at the point of interaction by analyzing every bit of data available. Big Data technologies comprise the set of technologies that enable those institutions to deliver to that vision. To a large extent, these technologies are made feasible by the rising capabilities of commodity hardware, the vast improvements in storage technologies, and the corresponding fall in the price of computing resources. The ABN AMRO experience (November 2011): Banks are traditionally considered the most advanced in data management. Highly transactional and digitally advanced, in fact some banks are difficult to distinguish from IT firms. They invest heavily in data infrastructure, as well as in the skills needed to analyse and interpret digital information. “Analysing financial data is the starting point of any financial institution,” said Paul Scholten, chief operating officer (COO) of ABN AMRO’s retail and private banking business. ABN AMRO has clean, complete financial data on both its customers and their internal operations. They capture nearly everything (for regulatory purposes), but only use the most valuable data for insight although they actively seek out new sources of data. However, there are challenges and Mr Scholten points to three obstacles that businesses across the financial services sector are facing: 1. Privacy - “We have the data and tools that can help our customers understand their spending habits at a deep level,” he says. “We can help them analyse their investment strategies, understand their tax situation better and save money. But we run into privacy issues with these things, and we have to be careful about what belongs to us, what belongs to customers and what belongs to the government.” 2. Unstructured data (see Glossary) - “We are used to structured, financial data,” he says. “We are not so good at the unstructured stuff.” He says the company is just beginning to understand the uses of social media, and what might be possible in terms of improving customer service. 3. Combining data across functions to yield new insights – Although ABN AMRO has an advanced risk analysis department; it does not cross-reference this data with marketing and regulatory or customer data sets. “We are working on that,” he says. “There is value to be had there.” In particular, Mr Scholten says that cross-referencing client complaints with operational risk might yield deeper insight into how operational problems affect customer service. The Avelo Big Data project Avelo is unique in that it sits at the heart of the Financial Services sector in the UK and is the conduit through which most Banks, Insurance Companies and Intermediaries conduct their business. As a consequence of its position Avelo is perfectly placed to use its domain knowldege to analyse and interpret digital information but also to leverage its deep understanding of technology to assemble the best infrastructure for any Big Data project. The two dimensional matrix below provides a convenient starting, albeit incomplete, framework for decomposing the high-level technology requirements for managing Big Data. The vertical axis shows the
  • 9. degree to which data is structured: Data can be unstructured, semi-structured or structured (See Glossary). Whereas the horizontal axis shows the lifecycle of data: Data is first acquired and stored, then organized and finally analyzed for business insight. Source: Oracle white paper, June 2012, Financial Services Data Management: Big Data Technology in Financial Services. The diagram suggests that a myriad of disparate technologies are needed to comprehensively handle Big Data requirements for the enterprise however, these are not ‘either/or’ technologies. They are to be viewed as parts of a data management continuum: each technology enjoys a set of distinct advantages depending on the phase in the lifecycle of data management and on the degree of structure within the data it needs to handle and so these technologies work together within the scope of an enterprise architecture. Avelo has researched the market and developed partnerships with companies that are best able to deliver Big Data projects across both the horizontal and the vertical axis. Those companies chose to work with Avelo because of Avelo’s domain knowledge. Whereas many industry participants (Banks and Insurance Companies) may have deeper knowledge of segments of the market they lack a holistic view which is essential in the analysis and interpretation of Big Data. For example, practices in one part of the industry can be used to improve and inform practices in another part; without a holistic view this education would not be possible and it is this sort of activity that makes the most of Big Data advances. The following are some examples of the Big Data related projects that Avelo is working on: o Structured data – Avelo is looking at the client data that passes through discrete sets of its software (e.g. the Avelo Exchange) to see if they can gather, organize and build advanced analytics that help their customers run their businesses more efficiently be it form a pricing, distribution or product development perspective.
  • 10. o Semi-structured data – Avelo is looking at the data that passes through all their products to see if they have a more holistic view of consumer behaviour within financial services. They believe they can blend data sets to give more depth to customer insight but also maintain a higher freshness value (see Glossary). They know this data will be valuable to their clients, giving almost real time insights into customer behaviour. o Unstructured data – Avelo is considering working with third parties (e-aggregators and affinity groups) to see if they can blend their structured and semi-structured data. They will also investigate whether consumers would want to upload their social media/ unstructured data into Avelo’s environment so that they can provide timely pertinent insights into each consumer’s financial status and planning. Avelo want to be seen helping the consumer of financial services make better more informed decisions. Financial services can feel overwhelming to consumers as the concepts are often abstract and in some cases alien to any other experience in their life. As a consequence decisions are, in many cases, not given the full attention they deserve. Avelo believe they can help consumers by building predictive analytics that sort through the masses of financial data, making it easier to make decisions by delivering only the most relevant information which consumers can use to make comparisons, inform decisions and ultimately fulfil. Constructing a Big Data capability at Avelo Avelo defines a Big Data capability as the roles, technologies, processes, and culture needed to support Big Data initiatives. Perhaps the most critical of these are the roles, and in particular, the expertise and experience needed to devise and implement Big Data strategies. Multiple roles are needed: • statisticians who are skilled in the latest statistical techniques; • analysts and decision scientists who understand business measurement and experimentation and who can be the broker between statisticians and business managers; • the IT group who provides guidance on selecting big data technologies/ techniques and who integrate business intelligence tools with transactional systems such as CRM and Web analytical tools; and • business managers and knowledge workers who own the business process and have to be comfortable with the new “language” of Big Data and social analytics. Building the algorithms At Avelo this is an iterative process, they have domain expertise which they use to build the algorithms but these algorithms have to be tested/ exercised regularly to ensure they remain fit for purpose. There have been instances where Avelo has devised an algorithm that has a high probability of identifying the target group but using a data set that is common to only a small proportion of the population. To extract value from the exercise they had to reduce the number of variables in the algorithm so that it applied to a larger population but by doing so they lost some accuracy. Over time, through a number of derivations and by
  • 11. finding more variables and data to mine, they enhanced the algorithm. In many instances they end up with algorithms that are vastly different from the ones they started with but in all cases the latest incarnations are the strongest, they have evolved through their iterative process all the while benefitting from their domain knowledge and deep data expertise. These algorithms are what set Avelo apart from any other provider of Big Data analytics within Financial Services. Overcoming the legal hurdles Companies need to keep up with policies and guidelines surrounding the use of Big Data, especially with non-transactional, social data that is often created and accessed outside company walls. Big Data policies should address issues regarding compliance, privacy, and security. Leading organizations must communicate and be honest in telling customers and consumers how they are using personal data, such as demographic information and past purchases. A rule of thumb that organizations should follow is to always think about the customer/ employee experience and their personal benefits from big data projects. Big Data projects that create a negative experience with users, despite the company benefits, should be redesigned. What is Avelo’s protection gap Big Data offering? The Protection Gap in the UK is increasing. Swiss Re's Term and Health Watch 2012 has found that the Life Assurance Protection Gap (the gap between what consumers should have for adequate cover and what they actually have) for the UK has increased by 20% from £2.0 trillion to £2.4 trillion over the past ten years. On average, this gap amounts to around £100,000 per person, with the amount of under-insurance greatest among single parents, couples with children and those aged 35 and under. Similarly the Income Protection Gap has increased by 46% over the past decade from £130 billion annual benefit to £190 billion annual benefit between 2002 and 2012. Swiss Re's UK CEO, Russell Higginbotham, says: "Under-insurance in both the life insurance and income protection areas is proving to be a long-term problem in the UK. The industry is faced with the challenge of better communicating to consumers how to alleviate the financial burden placed on families and dependents in difficult times. Simple life assurance cover is not expensive; for example, a healthy 35-year- old male non-smoker would only pay around £2 per week for £100,000 of life cover to age 65. " The protection gap theme was also taken up by Swiss Re's Chief Executive Officer of Reinsurance, Christian Mumenthaler, who looked at how life insurers can address the sales and distribution challenges associated with new business. "As our new study shows, protection cover is not top of consumers' minds when they think about providing for their dependents' financial needs or their own ill health or disability. One of the major barriers is that consumers think the sales process is too complicated," he said. He used statistics drawn from Swiss Re's Magnum system to show how potential life insurance policyholders can be put off buying life cover when subjected to a lengthy underwriting process.
  • 12. Comparing the conversion rates of sales made by intermediaries, the success rate for cases accepted at point of sale at standard rates is around 90%. This falls to about 70% when referred to an underwriter – consumers can easily lose interest. Mumenthaler explained that by harnessing the right data – such as health club memberships, buying patterns at the local supermarket and so on – insurers could automatically "pre-select" consumers who appear to represent a low risk, and who could be put on risk immediately at the point of sale without the burden of underwriting. Avelo, with it’s partners, is developing a service that targets individuals with a protection gap. Using their domain experience, and information gleaned from their products, Avelo are building algorithms that not only identify groups of consumers with a protection gap but also assesses whether they are simple or complex cases from an underwriting perspective. They are also able to benchmark those groups with other consumers that were in a similar situation but took action to close the gap. Avelo believe that this data in a direct to consumer channel offering, calibrated with the latest thinking in behavioural science and supported by a co browsing or telephone advice experience is unique in the market and could be part of the solution to the Protection challenge in the UK. What other Big Data offerings could Avelo explore? Aviva, in a September 2010 report, stated that the UK - and indeed the whole of Europe - is facing a retirement funding shortfall. The report was produced to quantify the pensions gap, to open up the debate and to find solutions. The pensions gap refers to the difference between the income needed to live comfortably in retirement, and the actual income individuals can currently expect. The study, carried out by Aviva in conjunction with Deloitte, showed that the annual European pensions gap for individuals retiring over the next 40 years is £1.6 trillion. The UK’s pensions gap stood at £318 billion and equated to an average of £10,300 per UK adult annually - the largest per person total across all European countries studied. Aviva believe that by engaging and empowering people with: 1. annual statements that show a forecast of pension retirement income from all sources, in one place; and 2. interactive tools (like a pension calculator) demonstrating how saving more earlier in life can improve their retirement income. Would help tackle the savings gap – giving people a greater understanding of what they are likely to receive, thereby encouraging people to save during their working lives for a better income in retirement. In the same way that Avelo is developing a service to tackle the Protection Gap in the UK, Avelo could also use a similar process to tackle the Pension Gap. Again, using their domain experience, and information gleaned from their products, Avelo could build algorithms that identify groups of consumers with a pension gap. These groups could be benchmarked against other groups of consumers that were in a similar situation but took action to close the gap. The benefits of closing the gap could be quantified in real life stories and used to persuade others to take action. Again, Avelo believe that this data in a direct
  • 13. to consumer channel offering that calculates forecasts of pension retirement income from all sources and is supported by a co browsing or telephone advice experience would also be unique in the market and could be part of the solution to the Pensions challenge in the UK. Summary With the cost of data capture and acquisition decreasing at a rapid rate, the real value of Big Data is in its use. Companies that effectively create and implement Big Data strategies, such as those described in this article, stand to gain a competitive advantage. Avelo has made a business decision that it wants to explore the Big Data opportunities within Financial Services, build a reputation for excellence in that field and help its clients develop a competitive edge. We hope you find this white paper informative and should you require any further information please do not hesitate to contact Avelo. References FOUR STRATEGIES TO CAPTURE AND CREATE VALUE FROM BIG DATA by Salvatore Parise, Bala Iyer, and Dan Vesset July / August 2012 (www.iveybusinessjournal.com/topics/strategy/four-strategies-to-capture- and-create-value-from-big-data). Vesset, D., Morris, H.D., Little, G., Borovick, L., Feldman, S., Eastwood, M., Woo, B., Villars, R.L., Bozman, J.S., Olofson, C.W., Conway, S., & Yezhkova, N. IDC market analysis: Worldwide big data technology and services, IDC #233485, March 2012. Duhigg, C. How companies learn your secrets, The New York Times, 2/16/2012. Hoffman, D.L., & Fodor, M. (2010). Can you measure the ROI of your social media marketing?, MIT Sloan Management Review, 52(1), 41-49. Schaefer, M.W. (2012). Return on influence, McGraw-Hill, pp 5-6. DC Customer Spotlight: Whirlpool corporation’s digital detectives: Attensity provides the lens, March 2011. www.sas.com/knowledge-exchange/risk/integrated-risk/big-data-a-big-bummer-for-financial-services www.ibm.com/software/data/bigdata/ Oracle white paper (June 2012, Financial Services Data Management: Big Data Technology in Financial Services). Swiss Re, Term & Health Watch 2013.
  • 14. Aviva, Mind the Gap, quantifying the pension gap in the UK, September 2010. Glossary Some of the following terms have been used in this document whereas others you will come across when discussing ‘Big Data’ with industry participants. • Apache Software Foundation (www.apache.org) - provides support for the Apache community of open-source software projects (this includes Hadoop and Subversion – see below), which provide software products for the public good. • Cluster - cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis used in many fields. • Cluster analysis - in itself is not one specific algorithm, but the general task to be solved. It can be achieved by various algorithms that differ significantly in their notion of what constitutes a cluster and how to efficiently find them. Clustering can be formulated as a multi-objective optimization problem. Cluster analysis as such is not an automatic task, but an iterative process of knowledge discovery or interactive multi-objective optimization that involves trial and failure. It will often be necessary to modify data pre-processing and model parameters until the result achieves the desired properties. • Data Artesian – people that understand data analysis and are business savvy. • Data Scientist – people who incorporate varying elements and build on techniques and theories from many fields, including maths, statistics, data engineering, pattern recognition and learning, advanced computing, visualization, uncertainty modelling, data warehousing, and high performance computing with the goal of extracting meaning from data and creating data products. • DBMS (DW) – Database Management System Data Warehouse. • DBMS (OLTP) – Database Management System Online Transaction Processing. • Distributed File Systems – In computing, a distributed file system or network file system is any file system that allows access to files from multiple hosts. • Exabyte (EB) – An Exabyte (EB) is a large unit of computer data storage. The prefix ‘Exa’ means one billion billion or one quintillion. An Exabyte is approximately one quintillion Bytes or a billion Gigabytes. An Exabyte of storage could contain 50,000 years' worth of DVD-quality video. The world's technological capacity to store information grew from 2.6 (optimally compressed) Exabytes in 1986 to 15.8 in 1993 to over 54.5 in 2000. As of 2012, about 2.5 Exabytes of data are created each day, and that number is doubling every 40 months. 90% of the data in the world today has been created in the last two years (www.ibm.com/software/data/bigdata/). A Byte (B) is smaller than a Megabyte (MB), which is small than a Gigabyte (GB), which is smaller than a Terabyte (TB) which is smaller than a Petabyte (PB), which is smaller than an Exabyte (EB), which is smaller than a Zettabyte (ZB), that is smaller than a Yottabyte (YB). • Freshness value – i.e. how fresh is the data and the insight? For triggers and nudges to be valuable the data must have a high ‘freshness value’.
  • 15. • Google BigQuery – allows you to run SQL-like queries against very large datasets, with potentially billions of rows. This can be your own data, or data that someone else has shared with you. BigQuery works best when analysing very large datasets, typically using a small number of very large, append-only tables. For more traditional relational database scenarios, you might consider using Google Cloud SQL instead. You can use BigQuery through a web UI called the BigQuery browser tool, the bq command-line tool, or by making calls to the REST API using various client libraries in multiple languages, such as Java, Python, etc. • Google Dremel - Dremel is a scalable, interactive ad-hoc query system for analysis of read-only nested data. By combining multi-level execution trees and columnar data layout, it is capable of running aggregation queries over one trillion-row tables in seconds. The system scales to thousands of CPUs and Petabytes of data, and has thousands of users at Google. • Google File System – a proprietary distributed file system developed by Google Inc. for its own use. • Hadoop (HD) (www.hadoop.apache.org) - is a distributed computing platform written in Java. It incorporates features similar to those of the ’Google File System’ and of ‘MapReduce’. a. What Hadoop is not – The developers behind Hadoop see a lot of emails where people hear about Hadoop and think it will be the silver bullet to solve all their application/ data centre problems. It is not. It solves some specific problems for some companies and organisations, but only after they have understood the technology and where it is appropriate. If you start using Hadoop in the belief it is a drop-in replacement for your database or SAN file system, you will be disappointed. b. Hadoop is not a substitute for a database - Databases are wonderful. Issue an SQL SELECT call against an indexed/ tuned database and the response comes back in milliseconds. Want to change that data? SQL UPDATE and the change is in. Hadoop does not do this. Hadoop stores data in files, and does not index them. If you want to find something, you have to run a ‘MapReduce’ job going through all the data. This takes time, and means that you cannot directly use Hadoop as a substitute for a database. Where Hadoop works is where the data is too big for a database (i.e. you have reached the technical limits, not just that you don't want to pay for a database license). With very large datasets, the cost of regenerating indexes is so high you can't easily index changing data. With many machines trying to write to the database, you can't get locks on it. Here the idea of vaguely-related files in a distributed file system can work. There is a high performance column-table database that runs on top of Hadoop HDFS: Apache HBase. This is a great place to keep the results extracted from your original data. c. MapReduce (MR) is not always the best algorithm - MapReduce is a profound idea: taking a simple functional programming operation and applying it, in parallel, to Petabytes of data. But there is a price. For that parallelism, you need to have each MR operation independent from all the others. If you need to know everything that has gone before, you have a problem. Although, such problems can be aided by iteration and shared state information. • Hana – Similar to Hadoop but run by SAP. • In-Memory Data Grids (IMDG) - designed to store data in main memory, ensure scalability and store an object itself. There are many IMDG products, both commercial and open source. • Key-Value Stores - have records which consist of a pair including a key and a payload. • MapReduce (MR) – is the key algorithm that the Hadoop (HD) engine uses to distribute work around a cluster.
  • 16. • Nested data - In general, something that is nested is fully contained within something else of the same kind. In programming, nested describes code that performs a particular function that is contained within code that performs a broader function. One well-known example is the procedure known as the nested do-loop. In data structures, data organizations that are separately identifiable but also part of a larger data organization are said to be nested within the larger organization. A table within a table is a nested table. A list within a list is a nested list. • Nudges – At Avelo ‘Nudges’ are the medium (e-mail, text, social media interaction) that when informed by a Trigger or Pre-triggers and used intelligently increase the probability of persuading a consumer to fulfil through a D2C website. • Pre-Triggers – At Avelo, these are models that predict a ‘Trigger’(see Triggers below); they provide a probability score of that ‘Trigger’ happening. • Structured data – The term structured data refers to data that is identifiable because it is organized in a structure. The most common form of structured data or structured data records (SDR) is a database where specific information is stored based on a methodology of columns and rows. • Subversion (www.subversion.apache.org) – Subversion exists to be universally recognized and adopted as an open-source, centralized version control system characterized by its reliability as a safe haven for valuable data; the simplicity of its model and usage; and its ability to support the needs of a wide variety of users and projects, from individuals to large-scale enterprise operations. • Triggers – At Avelo ‘Triggers’ are events that spur a consumer into taking action e.g. buying life insurance. That event could be a life changing moment e.g. birth of child, death of a close relative etc. (also see Pre-Triggers above): Triggers General Insurance Mortgages Protection Investments Pensions Annuities Buy a car  Death of a family member or close friend   Divorce  Got a new job  Had children   Just to give peace of mind about the future    Lost your job and are looking to replace benefits  Marriage    Previous financial difficulties  Protecting my health  Protecting my retirement income  Providing for my children   Purchased a new property    Saving for my retirement   To cover inheritance tax   To cover the unexpected loss of my job  To get a better return on my money  To get a guaranteed income in my retirement  You or a family member had a critical illness  
  • 17. • Unstructured Data – sensor data, social media outpourings, video and images that do not fit neatly into the rows and columns of most databases.