Strategies for Landing an Oracle DBA Job as a Fresher
Big data
1. “Rayat shikshan Sanstha’s”
Yashwantrao Chavan Institute of Science , Satara
Department of statistics
2017-2018
Seminar on
“Applications of statistics in Big data”.
Presented By-
Wagaj Rahul shamkarna
M.sc-1
Roll no -119
2. Introduction
Every day , big data is making its influence felt in our lives next big thing in world . Most useful
innovation of the past 20 year have been made possible by the massive data-gathering capabilities
combined with rapidly improving technology.
for e.g. - we have to find any information we need Google search engine or for online shopping
platform (amazon ,flipchart) & startup company . Online shopping is the ability of seller to provide
review of product & recommendation for future purchases. Recommendations are enabled by
application of “Big data”. The use of highly sophisticated data & identify items that tend to be
purchased by the same consumer . In addition to search Big data is making a major impact in
surprising number of other areas that affect our daily lives superior
3. What is big data
Having data bigger it require different approaches.
An aim to solve new problem or problem in better way.
Big data generates values from storage & processing of very large quantities of digital
information. that cannot be analyzed with traditional computing techniques
Walmart handles more than 1 million customer transaction every hour.
Facebook handles 40 billion photo from its user base.
4. Characteristics of big data
1)Volume :- volume is easy to understand .There is lot of data . A typical pc had 200
gigabytes storage In 2000 ,today Facebook 500 terabytes of new data every day.
e.g – YouTube ,cellphone ,whatapp, internet forums .
2)Velocity:- Velocity suggest that the data comes in faster than ever and must be stored
faster than ever.
e.g. -user million of event per sec ,high frequency stock treading algorithms reflect
market change within microseconds.
3)Variety :- Big data isn’t just number or a string .Big data is in data 3D data ,audio
&video &unstructured text ,including log files & social media.
5. Selecting of big data stores
Choosing the correct data stores based on your data characteristics.
Moving data to code.
Implementing polyglot data store solutions.
Aligning business goals to the appropriate data store.
6. Why big data
Increase of storage capacities.
Increase of processing power.
Availability of data.
Every day creates 2.5 quintillion bytes of data ; 90%of the data in the world
today has been created in the last two years alone.
e.g. - FB generates lot of data daily ,IBM 90% of today stored data was
generated in just the last two year
7. How big data is different ?
Automatically generated by machine
Typically an entirely new source of data :-use of internet
Not designed to be friendly
May not have much values :need to focus on the important part
8. Statistical Analysis of Big data
Gathering and storing massive quantities of data is major challenge ,but
ultimately the biggest and most important of big data is putting it to good
use.
Many statistical techniques can be used to analyze data to find
useful to data patterns:
e.g. – 1)Probability distribution-you would use the distribution the
likelihood of a given number of event occurring over an interval of time.
2)Normal distribution ,student’s t-distribution , chi-square distribution , F-
distribution .Regression analysis ,time series analysis.
9. Regression analysis
Regression analysis is used to estimate the strength and direction of the relationship between
variable that are linearly related to each other. Two variable X and Y are said to be linearly related
if the relationship between them can be written in the form . Y=ax+b , where , a is slope ,or the
change in Y due to a given change in X b is the intercept ,or the value Y when X=0
e.g.: suppose a corporation want to determine whether it’s advertising expenditure are
actually increasing profit, and if so, by how much. The corporation gather the data on advertising
and profit for the past 20 years and uses this data to estimate the following equation: Y=50+0.25X
where Y represent annual profit of corporation(m). X represents annual advertising expenditure of
corporation in (m).
10. Here, slope=0.25 intercept=50.Because slop of the regression line is 0.25 ,this indicate
that average ,for every $ 1 million increase in advertising expenditure ,profit rise by $.25
m .Because the intercept is 50 ,this indicates that with no advertising ,profit would still
be 50 m .
This eqn ,can be used forecast future profits based on planned
advertising expenditure .
e.g. If the corporation plan on spending $10 million on advertising next ,year its
expected profits will be follows: Y=50+0.25x
Y=50++0.25(10)52.5
hence ,with an advertising budget of $10 million next year, profits are expected to be
$52.5 million
11. Application of Big data
Smarter healthcare
Telecom
Traffic control
Search quality
Multi channel sales
Homeland security
Trading analytics (e-commerce)
Weather forecasting
Insurance
12. Risk of big data
Will be overwhelmed.
Costs escalate too fast.
Many source of big data is privacy.
13. Benefits of big data
Real time big data isn’t just a process for storing petabyte or Exabyte of
data in a data warehouse ,it’s about the ability to make better decisions and
take meaningful action at the right time.
Our newest research finds that organization are using big data to target
customer centric outcome ,tap into internal data and build a better
information ecosystem.
Big data is already an important part of the database & data analytics
market.
It offer commercial opportunities part of a comparable scale to enterprise
software in before 30 year.
14. Future of big data
This is hottest field in software, statistics ,Information technology , computer science
.Near about billion on software only specializing in data management and analytics.
This industry on its own it worth more than $100 billion and growing at almost 10%
a year which is roughly twice as fast as the software business as a whole.
In feb 2016 , the open source and analyst firm wikibon released the first market
forecast for big data , listing $5.1B revenue in 2013 with growth to $ 53.1B in 2017.
The Mckinsey global institute estimates that data volume is growing 40%per year
,and will grow 44 times increase 2009 to 2020 year .