Business DataWarehouse_Big Data

Big Data
MIS 6309 Business Data Warehousing Fall 2014 -GROUP 6

What is Big Data ?

‘Data’-The New oil of Information Revolution

‘Data’-The New Information Revolution‘Data’-The New oil of Information Revolution

What makes it ‘Big’ Data ?

Volume

Velocity

Variety

Hadoop
Hadoop

HDFS
HDFS

Map Reduce

Key =index.php
Value=1
Processing Logs

Hadoop Ecosystem

Big Data Landscape

What’s in store for us?
• More jobs
• More opportunities
• More Money!

Sectors Using Big Data
Enhancing the Multichannel Consumer experience:
• Use big data to integrate promotions and pricing for shoppers
seamlessly, whether those consumers are online, in-store, or perusing
a catalog.
• Integrate customer databases with information on households such
as income, housing values, and number of children and thus create
different versions of catalogs etc attuned to the behavior and
preferences of different groups of customers

Big Data Revenue

Increased Efficiency

Current Limitations for Big Data Analytics
• Meeting the need for speed
• Understanding the data
• Addressing data quality
• Displaying meaningful results
• Big data skills are in short supply.

Problems & Treats – Big Data
• Privacy breaches and embarrassments
• Anonymization could become impossible
• Data masking could be defeated to reveal personal
information
• Unethical actions based on interpretations
• Big data analytics are not 100% accurate
• Discrimination
• Few (if any) legal protections exist for the involved
individuals
• Big data will probably exist forever
• Concerns for e-discovery
• Making patents and copyrights irrelevant

Case Studies – Recent Data Breaches
• Target breach, in which 40 million credit and debit accounts were
compromised over a three-week period - lost $148 million dollars.
• JP Morgan reporting that 76 million households and 8 million small business
were exposed in a data breach.
• Customer names, addresses, phone numbers and e-mail addresses were
taken
• Hackers also obtained internal data identifying customers by category,
such as whether they are clients of the private-bank, mortgage, auto or
credit-card divisions, said a person briefed on the matter.
• Third party – External Data - News: Banks turn to Facebook and Twitter to
keep track of education loan takers

Thinking Dimensionally
Sentiment_Analysis Table
Sentiment_ID ( e g-1,2,3,)
Sentiment_description
(eg-Wow, Awesome, Crap)
Customer_ID
Product_ID
Dim_Customer
Customer_ID
Customer_Name
Gender
Age
Dim_Product
Product_ID
Product_Name
Category
Product_Description
Data-Big or small

Customer Name Location
Avadhoot Patil Dallas
Customer
name
Location
Ankur Kaushik Dallas
Customer
Name
Location
Avadhoot Patil Dallas
Ankur Kaushik Dalllas
Sort and Merge
Conformed Dimensions
Online_Customer Table Store_Customer Table

Airport
Name
City Country
ABC Dallas USA
Airport_ID Airport
Name
City Country
1001 ABC Dallas USA
1002 XYZ Dallas USA
Selecting Keys
• Anchor Dimensions with Durable Surrogate Keys
Natural Keys
durable surrogate keys.
slowly changing dimension
Datawarehouse System
Airport Data_source

 Dimensionalize data before applying governance
 Dimensionalize data as early as possible in the data pipeline
Governance
Parse Match
Identify
Resolution
on Fly

• Privacy is the Most Important Governance Perspective
 For Most form of Analysis the personal details should be
masked
 Data aggregated enough not to allow identification of
individuals
 Data masked or encrypted on write or data should be
masked on read.
Privacy

THANK YOU !

Business DataWarehouse_Big Data

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (14)

Similar to Business DataWarehouse_Big Data

Similar to Business DataWarehouse_Big Data (20)

Business DataWarehouse_Big Data