Vikas Samant is a big data and data science engineer who works with Entrench Electronics and Pentaho. He provides an overview of big data, defining it as large volumes of structured, semi-structured, and unstructured data that businesses must process daily. He describes the key characteristics of big data using the 3Vs - volume, variety, and velocity, and sometimes a fourth V of veracity. The document then discusses data structures, data science, the data science process, and provides examples of big data use cases like optimizing funnel conversion, behavioral analytics, customer segmentation, and fraud detection. It concludes with an overview of big data technologies, vendors, what Hadoop is, and why Hadoop is widely adopted.
5. “
Big data is a term that describes the large volume
of data
– structured, semi-structured and unstructured
– that overpower a business on a day-to-
day basis
5
6. Big data can be analyzed for insights
that lead to better decisions and
strategic business moves.
6
Big Data Contd…
10. Volume refers to the vast amounts of data generated
every second. We are not talking Terabytes but
Zettabytes or Brontobytes.
If we take all the data generated in the world between
the beginning of time and 2000, the same amount of
data will soon be generated every minute.
1.Volume
10
11. Velocity is the frequency of incoming data that needs
to be processed. The flow of data is massive and
continuous.
Think about how many SMS messages, Facebook
status updates, or credit card swipes are being sent
on a particular telecom carrier every minute of every
day, and you’ll have a good appreciation of velocity.
2.Velocity
11
12. Variety refers to the different types of data we can
now use. In the past we only focused on structured
data that neatly fitted into tables or relational
databases, such as financial data.
In fact, 80% of the world’s data is unstructured (text,
images, video, voice, etc.) With big data technology
we can now analyse and bring together data of
different types
3.Variety
12
13. Veracity refers to the messiness or trustworthiness of
the data. With many forms of big data quality and
accuracy are less controllable .
Just think of Twitter posts with hash tags,
abbreviations, typos and colloquial speech as well as
the reliability and accuracy of content but technology
now allows us to work with this type of data.
4.Varacity
13
14. Big Data :
Data Structure
14
Structured
Semi-Structured
“Quasi” Structured
Unstructured
15. Data containing a defined data type, format, structure.
Example: Transaction data and Data in Databases.
1. Structure
Data
15
16. Textual data files with a discernable pattern,
enabling parsing.
Example: XML data files that are self describing and
defined by an xml schema.
2.Semi-
Structure
Data
16
17. Textual data with erratic data formats, can be
formatted with effort, tools, and time.
Example: Web clickstream data that may contain
some inconsistencies in data values and formats.
.
3.Quasi
Sturecture
Data
17
http://www.google.com/#hl=en&sugexp=kjrmc&cp=8&gs_id=2m&xhr=t&q=
data+scientist&pq=big+data&pf=p&sclient=psyb&source=hp&pbx=1&oq=d
ata+sci&aq=0&aqi=g4&aql=f&gs_sm=&gs_upl=&bav=on.2,or.r_gc.r_pw.,cf.
osb&fp=d566e0fbd09c8604&biw=1382&bih=651
18. Data that has no inherent structure and is usually
stored as different types of files.
Example: Text documents, PDFs, images and video.
4.Unstructure
Data
18
20. How does Big Data relate to Data Science?
Big Data and
Data Science
20
21. Big Data and
Data Science
21
Data Science is the process of deriving insights from Big data to form decisions
and provide support to Organizations.
26. 26
BIG D A T A USE C A S E S :
1 . O p t i m i z e F u n n e l
C o n v e r s i o n
2 . B e h a v i o r a l
A n a l y t i c s
3 . C u s t o m e r
S e g m e n t a t i o n
4 . F r a u d
D e t e c t i o n
28. 28
1. OPTIMIZE FUNNEL
CONVERSION
Big data analytics allows companies to track leads through the
entire sales conversion process, from a click on an adword ad
to the final transaction, in order to uncover insights on how the
conversion process can be improved.
31. 31
2. Behavioral analytics
With access to data on consumer behavior, companies can
learn what prompts a customer to stick around longer, as well
as learn more about their customer’s characteristics and
purchasing habits in order to improve marketing efforts and
boost profits.
32. COMPANY
Nestle
Industry
Food and
Beverage
Employees
38000
Type
Behavioral Analytics
Purpose:
Customer complaints and PR crises have become more difficult to handle thanks
to social media. To better keep track of customer sentiment and what is being said
about the company online, Nestle created a 24/7 monitoring center to listen to all
of the conversations about the company and its products on social media. The
company will actively engage with those that post about them online in order to
mitigate damage and build customer loyalty.
34. 34
3. CUSTOMER SEGMENTATION
By accessing data about the consumer from multiple sources,
such as social media data and transaction history, companies
can better segment and target their customers and start to make
personalized offers to those customers.
42. What is Hadoop
Framework
42
Hadoop is an open source framework that supports the processing
and storage of extremely large data sets in a distributed computing
environment with commodity Hardware‘s.
43. Why Hadoop?
43
Studies show, that by
2020, 80% of all
Fortune 500
companies will have
adopted Hadoop.
A study at McKinsley Global Institute predicted that by 2020, the annual
GDP in manufacturing and retail industries will increase to $325 billion
with the use of big data analytics.