Data analysis final

423 views

Published on

Published in: Business, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
423
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Data analysis final

  1. 1. 20/10/2012 Data Analysis 1 - The Prediction of Data Data Analysis Team 05 1 Nobuya Yoshizawa, Goshi Fujimoto, Atsuko Chiba, Xu Changjing
  2. 2. Outline 1. Objectives 2. Hypothesis 3. Analysis process 4. Result 5. Conclusion 6. Possible reasons 7. Role of members Q&A 2
  3. 3. 1. Objectives  Does the future investment cause the high performance of management?  What is Experimental and research expense? The special expense for studying and researching new product or new technology Experimental and Research Expense (KYen/Firm) ⇒ Future 3 investment! 20,000 40,581 18,000 16,000 14,000 12,000 10,000 11,497 11,203 8,414 8,000 6,000 4,000 2,000 0 3,319 2,3692,1301,991 1,6281,3581,1991,1471,028 829 434 384
  4. 4. 2. Hypothesis When Experimental and research expense is high, Gross profit rate is high When a company produces new products, they might be expensive in short range and cause profitable. Total Asset is high Since a company produces new products and technology, the total asset of the company must be high. # of employee is high Large manufacturing company in Japan has a lot of employees and owns the laboratory to produce new products and technology. 4
  5. 5. 2. Hypothesis  Scatter with E&R expense Clear relationship between E&R expense and hypothesis variables. We are going to make the multi-regression model next… 5
  6. 6. 3. Analysis  To know deeply the objective data and find the correlation with various data 6 1. Overview ing the objectiv e data 2. Making the correlatio n matrix 3. Picking up explanatory variables 4. Developing the multi regression model 5. Improving the multi regression model
  7. 7. 3-1. Overviewing the objective data The overview of E&R expense 1. Half of firms with no investment to E&R 2. Another half of firms with wide range of investment to E&R 7
  8. 8. 3-1. Overviewing the objective data 815 companies (E&R expense > 0) 1275 companies (E&R expense = 0) 1. We are just interested in those companies which have experimental and research expense. So we decided to take the objective data of 815 out of 2090 companies. 2. We converted E&R expense to log10(E&R expense) as the objective variable to adjust the wide range numerically. 8
  9. 9. 3-2. Making the correlation matrix TotalA sset TotalAsset logTotal Asset Current Asset LongTerm LongTermL logE&R Asset iability expense … 1 0.589 0.603 0.960 0.936 0.426 … logTotalAsset 0.589 1 0.637 0.466 0.428 0.777 CurrentAsset 0.603 0.637 1 0.354 0.311 0.529 … LongTermAsset 0.960 0.466 0.354 1 0.987 0.313 … LongTermLiability 0.937 0.428 0.311 0.987 1 0.279 … logE&Rexpense 0.426 0.777 0.529 0.313 … … … … 0.279 … 1 … … To find the explanatory variables which have the strong relationship with E&R expense. To categorize the similar explanatory variables not to include multicollinearity. 9 …
  10. 10. 3-3. Picking up explanatory variables  Top variables which have strong relationship with E&R expense Log Total Asset Log Current Asset 0.777 Log Depreciation 0.760 Log Personal Expense 0.741 10 0.766 Log Number of Employee Log Note And Account Payable 0.706 Log Sales Income 0.756 0.748 Log Aggregate Value Log of Listed Stock BreakEvenPoint 0.787 0.697
  11. 11. 3-4. Developing the multi regression model  Based on hypothesis and statistical approach, we developed the multi regression model  Hypothesis is the most important because model must be easy to explain and be accepted to audience.  Then we tried to find the optimal explanatory variables without decreasing t-value and R^2 Hypothesis Statistics A variable E variable B variable C variable D variable 11 Objective variable F variable . . . .
  12. 12. 3-5. Improving the multi regression model  An example for improvement  We have found the relationship with  Total asset: High negative correlation  Current asset: High positive correlation  Then we convert total asset to current asset ratio (=Current asset / Total asset) to total asset as a very high positive correlation Current asset ratio is more important than total asset to explain E&R expense because • E&R expense is counted as deferred current asset • Companies are more active than them with no E&R 12
  13. 13. 4. Result Normalized coefficient P-value Gross profit rate 0.258 P<0.001 Current asset ratio to total asset 0.106 P<0.001 Log Number of employee 0.090 P<0.05 Log Inventory product 0.076 P<0.001 Percentage of export 0.088 P<0.001 Average salary 0.188 P<0.001 Consolidated income ratio to single income 0.092 P<0.001 Investment security 0.073 p<0.01 -0.139 P<0.001 Log Note and account receivable 0.111 P<0.01 Log Depreciation 0.489 P<0.001 Personal expense 13
  14. 14. 5. Conclusion I Smaller residuals Strongly fitted R^2 = 0.5000 model based on hypothesis R^2 = 0.750 Improved model Common characteristics : • • • • High profit rate, total asset ,cash flow and High investment on experimental installations and High number of employees and salary and, Large global companies. 14
  15. 15. 5. Conclusion II  As a result, we verified three hypothesis data and one optimal data induced by improving multi regression model. (Refer to Slide 11) Correlation The experimental and research expense is high Gross profit rate Verified The Capital Stock is correlated Total asset Verified The total asset is correlated # of employee is high Verified The # of employee is correlated Current asset ratio Verified The current asset ratio is correlated. 15
  16. 16. 6. Possible reasons  IT bubble era in 1996 -NEC, Fujitsu spent Experimental and research expenses in 1996. -IT bubble era, IT companies invested to market research and advanced technology to identify themselves from their domestic and foreign competitors.  Japanese manufacturing style -Large company, such as electricity, gas or exporting firms were afford to have laboratory, and spend the experimental and research expense. 16
  17. 17. 7. Role of members Name Role Fujimoto Goshi (Leader) -Facilitator -Analyzing data Xu Changjing (Co-leader) -Analyzing data Chiba Atsuko -Analyzing data Yoshizawa Nobuya -Preparing presentation slide 17
  18. 18. Thank you for your attention. Q&A 18
  19. 19. Appendix – simple model lm(formula = logExperimentalAndResearchExpense ~ logNumberOfEmployee + logTotalAsset + GrossProfitRate) Residuals: Min 1Q Median 3Q Max -2.02654 -0.27310 0.09164 0.37059 1.13517 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -2.674841 0.158979 -16.825 < 2e-16 *** logNumberOfEmployee 0.544842 0.082013 6.643 5.62e-11 *** logTotalAsset 0.708790 0.070862 10.002 < 2e-16 *** GrossProfitRate 0.014168 0.001247 11.365 < 2e-16 *** --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.4997 on 811 degrees of freedom Multiple R-squared: 0.6715, Adjusted R-squared: 0.6703 F-statistic: 552.5 on 3 and 811 DF, p-value: < 2.2e-16 19
  20. 20. Appendix – improved model Residuals: Min 1Q Median 3Q Max -2.02046 -0.23227 0.07728 0.29399 1.24620 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -2.301e+00 1.676e-01 -13.731 < 2e-16 *** logNumberOfEmployee 1.553e-01 7.990e-02 1.944 0.052296 . logNoteAndAccountReceivabe 1.676e-01 5.764e-02 2.908 0.003743 ** logInventoryProduct 6.072e-02 1.624e-02 3.739 0.000198 *** logDeprecoation 6.306e-01 6.513e-02 9.682 < 2e-16 *** GrossProfitRate 1.588e-02 1.145e-03 13.873 < 2e-16 *** PerCapitaPersonnelExpenseKYen -7.497e-05 1.165e-05 -6.437 2.09e-10 *** RatioTotalCurrentAsset 6.073e-01 1.535e-01 3.957 8.25e-05 *** PercentageOfExport 4.642e-03 9.954e-04 4.663 3.65e-06 *** ConsolidatedIncomeToSingleIncomeRatio 1.598e-01 3.371e-02 4.740 2.53e-06 *** AverageSalary 3.255e-06 3.971e-07 8.197 9.72e-16 *** InvestmentSecurity 3.755e-06 1.172e-06 3.204 0.001409 ** Residual standard error: 0.4382 on 803 degrees of freedom Multiple R-squared: 0.7499, Adjusted R-squared: 0.7465 F-statistic: 218.9 on 11 and 803 DF, p-value: < 2.2e-16 20

×