SlideShare a Scribd company logo
1 of 10
Download to read offline
BIG DATA
ASSIGNMENT
Submitted to: Mr. Vivek Gautam
Submitted by: Anuja Chatterjee
Roll No. 19DM039
PGDM
Birla Institute of Management Technology
December,2019
Assignment – 1
Questions for “Big Data Analytics” Course
Big Data:: Introduction to Big Data, its origination, explosion and
Challenges
1. How will you define “Big Data”?
Ans: Big data represents the data assets characterized by such a high volume, velocity and
variety to require specific technology and analytical methods for its transformation into value.
In traditional way it cannot be assessed.
2. What lead to the origination of Big Data?
The term Big Data was coined by Roger Mougalas back in 2005. However, in 1663, John
Graunt provided the world with the first statistical analysis of data ever recorded in his book
‘Natural and Political Observations Made upon the Bills of Mortality’. The starting point of
modern data begins in 1889 when a computing system was invented by Herman Hollerith in
an attempt to organize census data. The very first data-processing machine was named
‘Colossus’ and was developed by the British in order to decipher Nazi codes in World War II,
1943. The first data centre was built by the United States government in 1965 for the purpose
of storing millions of tax returns and fingerprint sets. This initiative was the starting point of
electronic big storage. In 2005, Yahoo created the now open-source Hadoop with the intention
of indexing the entire World Wide Web as people began to realise how much data each day is
generated through social media and internet platform. NoSQL also began to gain popularity
during this time. Although it seems like big data has been around for a long time now and that
we are getting closer to the pinnacle, big data may just be at its formidable stages. Big data in
the near future may end up making big data now seem like a poultry amount.
3. What is the difference between structured, un-structured and semi-
structured data?
Structured data: Data that is the easiest to search and organize, because it is usually contained
in rows and columns and its elements can be mapped into fixed pre-defined fields, is known as
structured data. In structured data, entities can be grouped together to form relations. This
makes structured data easy to store, analyze and search and until recently was the only data
easily usable for businesses. It is often manage by SQL (Structured Query Language).
Examples of structured data include financial data such as accounting transactions, address
details, demographic information, star ratings by customers, machines logs, location data from
smart phones and smart devices, etc.
Unstructured Data: Unstructured data is data that cannot be contained in a row-column
database and doesn’t have an associated data model. Most of the present data in world is
unstructured. The lack of structure made unstructured data more difficult to search, manage
and analyse, which is why companies have widely discarded unstructured data, until the recent
proliferation of artificial intelligence and machine learning algorithms made it easier to
process. Instead of spreadsheets or relational databases, unstructured data is usually stored
in data lakes, NoSQL databases, applications and data warehouses. Example: photos, video
and audio files, text files, social media content, satellite imagery, presentations, PDFs, open-
ended survey responses, websites etc.
Semi-structured Data: The third category is semi-structured data. It has some defining or
consistent characteristics but doesn’t conform to a structure as rigid as is expected with a
relational database. Therefore, there are some organizational properties such as semantic tags
or metadata to make it easier to organize, but there’s still fluidity in the data. Email messages
are a good example. While the actual content is unstructured, it does contain structured data
such as name and email address of sender and recipient, time sent, etc.
4. In the lecture, we have discussed various business are taking
advantage of Big Data. Please provide any 1 use-case for retail and
banking Industry?
Retail Industry:.
Personalizing Customer Experience
For retailers, big data can create opportunities to provide better customer experiences. Costco
uses their data collection to keep customers healthy. When a California fruit packing company
warned Costco about the possibility of listeria contamination in fruits like peaches and plums,
Costco was able to email specific customers who had purchased the items affected by the
contamination instead of a blanket email to their lists.
Forecasting Demand in Retail
In addition to big data, some algorithms analyze social media and web browsing trends to
predict the next big thing in the retail industry. Perhaps one of the most interesting data points
for forecasting demand is the weather.
Brands like Walgreens and Pantene worked with the Weather Channel to account for weather
patterns in order to customize product recommendations for consumers. Walgreens and
Pantene anticipated increases in humidity–a time when women would be seeking anti-frizz
products–and served up ads and in-store promotions to drive sales.
Banking Industry:
Business intelligence (BI) tools are capable of identifying potential risks associated with money
lending processes in banks. With the help of big data analytics, banks can analyze the market
trends and decide on lowering or increasing interest rates for different individuals across
various regions.
Data entry errors from manual forms can be reduced to a minimum as big data point out
anomalies in customer data too.
With fraud detection algorithms, customers who have poor credit scores can be identified so
that banks don’t loan money to them. Yet another big application in banking is limiting the
incidences of fraudulent or dubious transactions that could promote anti-social activities or
terrorism.
5. In the lecture, we have observed various challenges associated with Big
Data, but what are the biggest challenge associated with “Big Data” in
your terms?
Lack of understanding: Frequently, organizations neglect to know what big data really is,
what are its advantages, what infrastructure is required, and so on. Without reasonable
comprehension, a big data deployment project is a danger to be destined to disappointment.
Big data, being an enormous change for an organization, ought to be acknowledged by top
management first and afterward down the stepping ladder. To guarantee big data
comprehension and acknowledgment at all levels, organizations need to compose various
training and workshops.
Quality of data: Big data must be cleaned, prepared, verified, reviewed for compliance and
constantly maintained. The issue with these tasks is that information comes in so quick
organizations think that it’s hard to play out the majority of the data preparation activities to
guarantee ideal data quality.
Security: Big data deployment projects put security checking at later stage which is not
advisable.
Big data technologies are progressing, but their safety highlights are still being overlooked as
it is optimistic that security will be enabled at the application level.
6. Develop a use case to implement big data in your assignment and
address following questions.
a. What are the challenges of gathering "Big Data"?
Banking Industry:
Legal and regulatory challenges are the prime one which banking industry is
facing in implementation of big data. It has complexities and limitations due to sheer
size. Many companies already have control and data management procedures in place
for small data—and a comfort level that those controls are appropriate. Given the
growing impacts of regulation and oversight, Banks are steering clear of Big Data—
or at least proceeding judiciously—simply because of the risks.
Privacy and security
Big Data offers great potential to provide major steps forward for Banks, but it also comes with
a large red flag concerning privacy and intrusion. The potential for abuse of this data is
significant, but Banks need to get it right and use it only for increasing customer satisfaction
level. ‘
Organisation Mindset:
Many Banks are still driven by mostly Past Experience, Intuition, SME knowledge and
Customer Experience. They need to have more Data Curiosity, Data Driven thinking and need
to invest more in acquiring, storing and analyzing data.
Talent management:
Blending data scientists and visualization teams is a new workforce management paradigm.
Big Data Specialist need to have solid business understanding, SAS/R/SQL/Python
programming and statistical knowledge along with Visualization skill.
Data Quality:
Data Quality attributes — validity, accuracy, timeliness, reasonableness, completeness, and so
forth — must be clearly defined, measured, recorded, and made available to end users. For Big
Data Quality and Data Management, Banks need to create Data Quality metadata that includes
Data Quality attributes, measures, business rules, mappings, cleansing routines, data element
profiles, and controls.
b. What benefit you can derive from data analysis?
Risk management: The banking industry is built on risk, so every loan and
investment needs to be evaluated. Big data can give banks new insights into their
systems, transactions, customers and environments to help them avoid certain risks.
Marketing automation: With the volumes of data available today, banks can gather
previously unimaginable information about each of their customers. This gives them
a better understanding of customers’ needs and helps them to address these needs
proactively. It also allows different departments within a bank, such as marketing,
sales and IT, to work more cohesively as a single unit.
Transaction Channel Identification: The banks benefit greatly by understanding if
their customers withdraw in cash all the sum available on the payday, or if they prefer
to keep their money on the credit/debit card. Obviously, the latter customers can be
approached with the offers to invest in short-term loans with high payout rates, etc.
Fraud management: Knowing the usual spending patterns of an individual helps
raise a red flag if something outrageous happens. If a cautious investor who prefers to
pay with his card attempts to withdraw all the money from his account via an ATM,
this might mean the card was stolen and used by fraudsters. A call from a bank
requesting a clearance for such operation helps easily understand if it is a legitimate
claim or a fraudulent behavior the cardholder does not know of.
Assignment – 5
Please mention 5 differences between HBASE vs MongoDB vs Cassandra
Ans:
Parameter HBase MongoDB Cassandra
Protocol HTTP/REST
(also Thrift)
Custom binary
(BSON)
CQL3 and
Thrift
Server OS Linux, Unix,
Windows
Linux, OS X,
Solaris,
Windows
FreeBSD,
Linux, OS X,
Windows
Replication Master-Slave
Replication
Master-Slave
Replication
Master less
Ring
Key Point Billions of rows
and million of
columns
Retain amicable
properties like
query and
index.
Store large data
set in almost
SQL
Popular Use Cases Online Log
Analytics,
Hadoop, Write
Operational
Intelligence,
Product Data
Sensor Data,
Messaging
Systems, E-
Heavy
Applications,
MapReduce
Management,
Content
Management
Systems, IoT,
Real-Time
Analytics
commerce
Websites,
Always-On
Applications,
Fraud Detection
for Banks

More Related Content

What's hot

Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecture
pcherukumalla
 
Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...
Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...
Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...
Simplilearn
 
Big data
Big dataBig data
Big data
hsn99
 

What's hot (20)

Business intelligence
Business intelligenceBusiness intelligence
Business intelligence
 
File system vs DBMS
File system vs DBMSFile system vs DBMS
File system vs DBMS
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecture
 
Data analysis
Data analysisData analysis
Data analysis
 
Big Data Characteristics And Process PowerPoint Presentation Slides
Big Data Characteristics And Process PowerPoint Presentation SlidesBig Data Characteristics And Process PowerPoint Presentation Slides
Big Data Characteristics And Process PowerPoint Presentation Slides
 
Data mining
Data miningData mining
Data mining
 
Lecture #01
Lecture #01Lecture #01
Lecture #01
 
multi dimensional data model
multi dimensional data modelmulti dimensional data model
multi dimensional data model
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
Big data unit i
Big data unit iBig data unit i
Big data unit i
 
Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...
Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...
Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 
Data Visualization
Data VisualizationData Visualization
Data Visualization
 
Big data lecture notes
Big data lecture notesBig data lecture notes
Big data lecture notes
 
Business intelligence and big data
Business intelligence and big dataBusiness intelligence and big data
Business intelligence and big data
 
OLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSEOLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSE
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 
Data visualisation & analytics with Tableau
Data visualisation & analytics with Tableau Data visualisation & analytics with Tableau
Data visualisation & analytics with Tableau
 
Big data
Big dataBig data
Big data
 

Similar to Big data assignment

Practical analytics john enoch white paper
Practical analytics john enoch white paperPractical analytics john enoch white paper
Practical analytics john enoch white paper
John Enoch
 
Project 3 – Hollywood and IT· Find 10 incidents of Hollywood p.docx
Project 3 – Hollywood and IT· Find 10 incidents of Hollywood p.docxProject 3 – Hollywood and IT· Find 10 incidents of Hollywood p.docx
Project 3 – Hollywood and IT· Find 10 incidents of Hollywood p.docx
stilliegeorgiana
 

Similar to Big data assignment (20)

Unit III.pdf
Unit III.pdfUnit III.pdf
Unit III.pdf
 
Data foundation for analytics excellence
Data foundation for analytics excellenceData foundation for analytics excellence
Data foundation for analytics excellence
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Unit-1 introduction to Big data.pdf
Unit-1 introduction to Big data.pdfUnit-1 introduction to Big data.pdf
Unit-1 introduction to Big data.pdf
 
Unit-1 introduction to Big data.pdf
Unit-1 introduction to Big data.pdfUnit-1 introduction to Big data.pdf
Unit-1 introduction to Big data.pdf
 
Big data
Big dataBig data
Big data
 
Analysis of Big Data
Analysis of Big DataAnalysis of Big Data
Analysis of Big Data
 
Practical analytics john enoch white paper
Practical analytics john enoch white paperPractical analytics john enoch white paper
Practical analytics john enoch white paper
 
Big Data why Now and where to?
Big Data why Now and where to?Big Data why Now and where to?
Big Data why Now and where to?
 
big data.pptx
big data.pptxbig data.pptx
big data.pptx
 
big-data.pdf
big-data.pdfbig-data.pdf
big-data.pdf
 
Three big questions about AI in financial services
Three big questions about AI in financial servicesThree big questions about AI in financial services
Three big questions about AI in financial services
 
Big Data: Where is the Real Opportunity?
Big Data: Where is the Real Opportunity?Big Data: Where is the Real Opportunity?
Big Data: Where is the Real Opportunity?
 
Bda assignment can also be used for BDA notes and concept understanding.
Bda assignment can also be used for BDA notes and concept understanding.Bda assignment can also be used for BDA notes and concept understanding.
Bda assignment can also be used for BDA notes and concept understanding.
 
Embracing data science
Embracing data scienceEmbracing data science
Embracing data science
 
What is big data ? | Big Data Applications
What is big data ? | Big Data ApplicationsWhat is big data ? | Big Data Applications
What is big data ? | Big Data Applications
 
The dawn of Big Data
The dawn of Big DataThe dawn of Big Data
The dawn of Big Data
 
Big Data in Banking (White paper)
Big Data in Banking (White paper)Big Data in Banking (White paper)
Big Data in Banking (White paper)
 
Thinking Small: Bringing the Power of Big Data to the Masses
Thinking Small:  Bringing the Power of Big Data to the MassesThinking Small:  Bringing the Power of Big Data to the Masses
Thinking Small: Bringing the Power of Big Data to the Masses
 
Project 3 – Hollywood and IT· Find 10 incidents of Hollywood p.docx
Project 3 – Hollywood and IT· Find 10 incidents of Hollywood p.docxProject 3 – Hollywood and IT· Find 10 incidents of Hollywood p.docx
Project 3 – Hollywood and IT· Find 10 incidents of Hollywood p.docx
 

Recently uploaded

如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
chadhar227
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
vexqp
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
vexqp
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
ranjankumarbehera14
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
gajnagarg
 

Recently uploaded (20)

如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptx
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdf
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
 
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
SR-101-01012024-EN.docx  Federal Constitution  of the Swiss ConfederationSR-101-01012024-EN.docx  Federal Constitution  of the Swiss Confederation
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 

Big data assignment

  • 1. BIG DATA ASSIGNMENT Submitted to: Mr. Vivek Gautam Submitted by: Anuja Chatterjee Roll No. 19DM039 PGDM Birla Institute of Management Technology December,2019
  • 2. Assignment – 1 Questions for “Big Data Analytics” Course Big Data:: Introduction to Big Data, its origination, explosion and Challenges 1. How will you define “Big Data”? Ans: Big data represents the data assets characterized by such a high volume, velocity and variety to require specific technology and analytical methods for its transformation into value. In traditional way it cannot be assessed. 2. What lead to the origination of Big Data? The term Big Data was coined by Roger Mougalas back in 2005. However, in 1663, John Graunt provided the world with the first statistical analysis of data ever recorded in his book ‘Natural and Political Observations Made upon the Bills of Mortality’. The starting point of modern data begins in 1889 when a computing system was invented by Herman Hollerith in an attempt to organize census data. The very first data-processing machine was named ‘Colossus’ and was developed by the British in order to decipher Nazi codes in World War II, 1943. The first data centre was built by the United States government in 1965 for the purpose of storing millions of tax returns and fingerprint sets. This initiative was the starting point of electronic big storage. In 2005, Yahoo created the now open-source Hadoop with the intention of indexing the entire World Wide Web as people began to realise how much data each day is generated through social media and internet platform. NoSQL also began to gain popularity during this time. Although it seems like big data has been around for a long time now and that we are getting closer to the pinnacle, big data may just be at its formidable stages. Big data in the near future may end up making big data now seem like a poultry amount.
  • 3. 3. What is the difference between structured, un-structured and semi- structured data? Structured data: Data that is the easiest to search and organize, because it is usually contained in rows and columns and its elements can be mapped into fixed pre-defined fields, is known as structured data. In structured data, entities can be grouped together to form relations. This makes structured data easy to store, analyze and search and until recently was the only data easily usable for businesses. It is often manage by SQL (Structured Query Language). Examples of structured data include financial data such as accounting transactions, address details, demographic information, star ratings by customers, machines logs, location data from smart phones and smart devices, etc. Unstructured Data: Unstructured data is data that cannot be contained in a row-column database and doesn’t have an associated data model. Most of the present data in world is unstructured. The lack of structure made unstructured data more difficult to search, manage and analyse, which is why companies have widely discarded unstructured data, until the recent proliferation of artificial intelligence and machine learning algorithms made it easier to process. Instead of spreadsheets or relational databases, unstructured data is usually stored in data lakes, NoSQL databases, applications and data warehouses. Example: photos, video and audio files, text files, social media content, satellite imagery, presentations, PDFs, open- ended survey responses, websites etc. Semi-structured Data: The third category is semi-structured data. It has some defining or consistent characteristics but doesn’t conform to a structure as rigid as is expected with a relational database. Therefore, there are some organizational properties such as semantic tags
  • 4. or metadata to make it easier to organize, but there’s still fluidity in the data. Email messages are a good example. While the actual content is unstructured, it does contain structured data such as name and email address of sender and recipient, time sent, etc. 4. In the lecture, we have discussed various business are taking advantage of Big Data. Please provide any 1 use-case for retail and banking Industry? Retail Industry:. Personalizing Customer Experience For retailers, big data can create opportunities to provide better customer experiences. Costco uses their data collection to keep customers healthy. When a California fruit packing company warned Costco about the possibility of listeria contamination in fruits like peaches and plums, Costco was able to email specific customers who had purchased the items affected by the contamination instead of a blanket email to their lists. Forecasting Demand in Retail In addition to big data, some algorithms analyze social media and web browsing trends to predict the next big thing in the retail industry. Perhaps one of the most interesting data points for forecasting demand is the weather. Brands like Walgreens and Pantene worked with the Weather Channel to account for weather patterns in order to customize product recommendations for consumers. Walgreens and Pantene anticipated increases in humidity–a time when women would be seeking anti-frizz products–and served up ads and in-store promotions to drive sales.
  • 5. Banking Industry: Business intelligence (BI) tools are capable of identifying potential risks associated with money lending processes in banks. With the help of big data analytics, banks can analyze the market trends and decide on lowering or increasing interest rates for different individuals across various regions. Data entry errors from manual forms can be reduced to a minimum as big data point out anomalies in customer data too. With fraud detection algorithms, customers who have poor credit scores can be identified so that banks don’t loan money to them. Yet another big application in banking is limiting the incidences of fraudulent or dubious transactions that could promote anti-social activities or terrorism. 5. In the lecture, we have observed various challenges associated with Big Data, but what are the biggest challenge associated with “Big Data” in your terms? Lack of understanding: Frequently, organizations neglect to know what big data really is, what are its advantages, what infrastructure is required, and so on. Without reasonable comprehension, a big data deployment project is a danger to be destined to disappointment. Big data, being an enormous change for an organization, ought to be acknowledged by top management first and afterward down the stepping ladder. To guarantee big data
  • 6. comprehension and acknowledgment at all levels, organizations need to compose various training and workshops. Quality of data: Big data must be cleaned, prepared, verified, reviewed for compliance and constantly maintained. The issue with these tasks is that information comes in so quick organizations think that it’s hard to play out the majority of the data preparation activities to guarantee ideal data quality. Security: Big data deployment projects put security checking at later stage which is not advisable. Big data technologies are progressing, but their safety highlights are still being overlooked as it is optimistic that security will be enabled at the application level. 6. Develop a use case to implement big data in your assignment and address following questions. a. What are the challenges of gathering "Big Data"? Banking Industry: Legal and regulatory challenges are the prime one which banking industry is facing in implementation of big data. It has complexities and limitations due to sheer size. Many companies already have control and data management procedures in place for small data—and a comfort level that those controls are appropriate. Given the growing impacts of regulation and oversight, Banks are steering clear of Big Data— or at least proceeding judiciously—simply because of the risks.
  • 7. Privacy and security Big Data offers great potential to provide major steps forward for Banks, but it also comes with a large red flag concerning privacy and intrusion. The potential for abuse of this data is significant, but Banks need to get it right and use it only for increasing customer satisfaction level. ‘ Organisation Mindset: Many Banks are still driven by mostly Past Experience, Intuition, SME knowledge and Customer Experience. They need to have more Data Curiosity, Data Driven thinking and need to invest more in acquiring, storing and analyzing data. Talent management: Blending data scientists and visualization teams is a new workforce management paradigm. Big Data Specialist need to have solid business understanding, SAS/R/SQL/Python programming and statistical knowledge along with Visualization skill. Data Quality: Data Quality attributes — validity, accuracy, timeliness, reasonableness, completeness, and so forth — must be clearly defined, measured, recorded, and made available to end users. For Big Data Quality and Data Management, Banks need to create Data Quality metadata that includes Data Quality attributes, measures, business rules, mappings, cleansing routines, data element profiles, and controls.
  • 8. b. What benefit you can derive from data analysis? Risk management: The banking industry is built on risk, so every loan and investment needs to be evaluated. Big data can give banks new insights into their systems, transactions, customers and environments to help them avoid certain risks. Marketing automation: With the volumes of data available today, banks can gather previously unimaginable information about each of their customers. This gives them a better understanding of customers’ needs and helps them to address these needs proactively. It also allows different departments within a bank, such as marketing, sales and IT, to work more cohesively as a single unit. Transaction Channel Identification: The banks benefit greatly by understanding if their customers withdraw in cash all the sum available on the payday, or if they prefer to keep their money on the credit/debit card. Obviously, the latter customers can be approached with the offers to invest in short-term loans with high payout rates, etc. Fraud management: Knowing the usual spending patterns of an individual helps raise a red flag if something outrageous happens. If a cautious investor who prefers to pay with his card attempts to withdraw all the money from his account via an ATM, this might mean the card was stolen and used by fraudsters. A call from a bank requesting a clearance for such operation helps easily understand if it is a legitimate claim or a fraudulent behavior the cardholder does not know of.
  • 9. Assignment – 5 Please mention 5 differences between HBASE vs MongoDB vs Cassandra Ans: Parameter HBase MongoDB Cassandra Protocol HTTP/REST (also Thrift) Custom binary (BSON) CQL3 and Thrift Server OS Linux, Unix, Windows Linux, OS X, Solaris, Windows FreeBSD, Linux, OS X, Windows Replication Master-Slave Replication Master-Slave Replication Master less Ring Key Point Billions of rows and million of columns Retain amicable properties like query and index. Store large data set in almost SQL Popular Use Cases Online Log Analytics, Hadoop, Write Operational Intelligence, Product Data Sensor Data, Messaging Systems, E-