Auxilion - The Implications of Big Data on the Roadmap Towards Business Intelligence
1. www.auxilion.com
THE IMPLICATIONS OF BIG DATA ON THE
ROADMAP TOWARDS BUSINESS INTELLIGENCE
Pedro Mac Dowell Innecco – Business Intelligence Principal
2. AGENDA
• About Us
• A Quick Definition of BI
• What is Big Data
• Classifying Corporate Data
• Managing Big Data
3. ABOUT AUXILION
• Sister company of the IT Alliance Group
• Cloud transformation and support company
– Help customers leverage the benefits of cloud computing to their business strategy
• R&D: Looking at new ways of providing business value to our customers
– Development of new products and services
4. AS FOR ME…
BI Principal at Auxilion
• Overlooks everything BI from a tactical point of view
• Engaging with customers and partners on BI projects
• Researching on the convergence of BI, cloud and big data
Accreditations
• 20+ years experience in the IT industry
• MSc in Business Intelligence Systems and Data Mining (De Montfort University)
• Masters in Business Administration (Plymouth University)
• Microsoft Certified Professional (MCT Alumni, MCTS: SQL, Dynamics CRM, SharePoint)
• PRINCE2, ITL, IASA CITA-F
5. DEFINITION OF “BUSINESS INTELLIGENCE”
“Business Intelligence is a set of methodologies, processes, architectures,
and technologies that transform raw data into meaningful and useful
information used to enable more effective strategic, tactical, and operational
insights and decision-making” (Forrester Research)
• The term “Business Intelligence” goes back 150 years!
– To increase profit by acting on information about the environment prior to the competition
– In a nutshell: A strategy to gain competitive advantage
Technology is just a subset of BI
6. WHAT IS BIG DATA
“High-volume, high-velocity and high-variety information asset that demand
cost-effective, innovative forms of information processing for insight and
decision making” (Gartner, 2013)
• Laney’s 3 Vs of Big Data
– High-volume: Very large datasets ranging from terabytes to petabytes of data.
– High-velocity: Not only the speed in which data is generated, but also the speed in which
it must be analysed.
– High-variety: Images, XML, documents, text, videos, sensor information (e.g. weather)
– These Factors make big data a challenge to manage
• Additional Vs
– Veracity, Variability, Visualisation, Value
Big data represents a natural evolution for BI
7. CLASSIFICATION OF CORPORATE DATA
• Structured Data
– Those records found in relational databases (contains metadata)
– Immediately understood by systems
• Unstructured, repetitive data
– Records generated by analogue processing, appears very similar to one another
– Examples: Metering data, clickstreams
• Unstructured, non-repetitive data
– Likely to be totally different from one another in context and structure (high variability)
– Examples: Contractual documents, email messages (body)
Unstructured data accounts for 90% of all digital Information
(International Data Corp, 2014)
9. KEY POINTS FOR MANAGING BIG DATA
• Getting business value starts by managing big data appropriately
– Big data management is becoming a majority practice
– Embrace big data as soon as possible to keep pace with its growth
• Extending data management skills and software portfolio
– Data management includes many data disciplines and software from multiple vendors
• Must be incorporated into enterprise data management
– Fold it into enterprise data architecture
• Strategy must combine both business (macro) and technology (micro)
• Cloud represents a significant opportunity for big data
– Convergence around the commodity of processing power
– Most chosen deployment model among successful big data initiatives, and is a major factor
10. TOP OPPORTUNITIES AND BARRIERS
• Opportunities
– Improved analytics: Broader data source, better information exploration
– Greater business value: business optimisation, addressing new requirements / change
– Improved sales and marketing: recognising opportunities, customer churn / behaviour
• Barriers
– Low maturity: staff/skills, data management infrastructure, new data sources
– Poor business support: Governance, business sponsorship, compelling business case
– Solution design: integration complexities, architecture of systems and enterprise data
Business hurdles are more serious than technology hurdles
11. BIG DATA SKILLS
• Big Data demands new skills and perhaps new hires
– Lack of skills to address integration complexities and network infrastructure
– Reinforces the need for architects
– Lack of time indicates the urgency of hiring new staff
• Organisations have most skills of traditional data
– Existing BI knowledge has a strong impact in the development of big data management
– Experience in traditional BI/DW increases success of big data initiatives
• Companies most prone to new hiring rather than training
– Time is of the essence, and is a significant hurdle
• Strong demand in mathematics
– Data scientist are mostly identified as being both developers and mathematicians
12. THE DATA LIFECYCLE
• Capacity planning is more important than ever
– 10 to 99 TB is often common for in big data
– 1 PT barrier is likely to be broken in the next years
• Consider the lifecycle of data
– Summarising/moving data according to decrease in demand
– Age can be a factor, but not always
• Not many organisations taking data lifecycle into consideration
– Performance and crippling costs likely to become an issue
13. BRIDGING THE NEW AND THE OLD
• Big data can rely on existing platforms, additional ones or both
– Extended RDBMS (including Parallel DW), Hadoop and NoSQL
– Most organisations combine two or more systems
• Big data systems does not replace relational DBs or data warehouses
– The need for relational (structured) data is not going away
– DW is an architecture, not a technology! (Inmon)
– Most organisations are aware of this, despite claims by some big data vendors
• Traditional RDBMS systems are being extended with big data features
– DW appliances architected for massive parallel computing (Parallel Data Warehousing)
– Support the bridging between structured and unstructured data
– Big data often brought into RDBMS (either physically or virtually) for analysis
– Example: Querying data residing in Hadoop straight from SQL Server
14. WHERE SHOULD BIG DATA RESIDE?
• Trying to hold data other than structured data into RDBMS is a bad idea
• For unstructured data: Hadoop or NoSQL should be the norm
– Choice between both requires due diligence
– Some success stories arising from combining both systems
– Remember: Data can always be surfaced into transactional DBs for traditional analysis
• Data Lakes
– Data lakes are for big data what data warehouses are for traditional (relational) BI
– In DW we extract, transform, then load the data
– In Data Lakes we extract, load, and then transform the data as needed
– Data lakes hold all data in raw format so it can be processed according to future needs
15. DATA CONTEXTUALISATION
• Contextualisation is a must for unstructured, non-repetitive data
– Understand homonyms, slangs, typos and sarcasm
– Decipher human emotions (important to anticipate customer churn)
• Examples of context
– what type of data has been stated; what was it stated in response to; where was it
stated; when was it stated; how was it stated; who stated it; the day and time it was
stated; and what was stated before and after it.
– Involves natural language processing
Most big data systems does not provide solutions for unstructured, non-
repetitive data. Yet this is where most of the value lies (Bill Inmon)
16. SOURCES
• RUSSOM, P. (2013) Managing Big Data. Renton (WA), USA: TDWI
Research
• INNECCO, P. (2015) Implications of Cloud Computing and Big Data on the
Roadmap Towards Business Intelligence. Unpublished Dissertation (MSc),
De Montfort University