데이터 과학 입문 5장

1. Doing

2. Data

3. Science

4. ch.5

5. 로지스틱

6. 회귀 cecil

7. 이

8. 장에서는? 분류기를

9. 선택하는

10. 과정에

11. 초점을

12. 맞추어

13. 로지스틱

14. 회귀를

15. 이용한

16. 분류를

17. 설명

18. 데이터

19. 포인트를

20. 유한

21. 개의

22. 분류

23. 집합,

24. 또는

27. 분류명이나

28. 분류명에

29. 속할

30. 확률

31. 값에

32. 사상하는

33. 것 분류란?

34. (Classifiers)

36. 질문 답변 이

37. 광고를

38. 누군가가

39. 클릭할

40. 것인가? 0

41. 또는

42. 1(예,

43. 아니오) 이것은

44. 무슨

45. 숫지인가? 0,

46. 1,

48. …⋯ 이

49. 기사는

50. 어떤

51. 내용인가? 스포츠 이것은

52. 스팸인가? 0

53. 또는

54. 1 이것은

55. 두통

56. 약인가? 0

57. 또는

58. 1 언제

59. 분류

60. 작업을

61. 하는가? 이

62. 장에서는

63. 0과

64. 1에

65. 대해서만

66. 다룸.

67. 분류와

68. 관련하여

69. 생각해

70. 볼

71. 것들 1. 어떤

72. 분류기를

73. 사용할

74. 것인가?

75. 2. 어떤

76. 최적화

77. 방법을

78. 선택할

79. 것인가?

80. 3. 어떤

81. 손실

82. 함수를

83. 최소화

84. 할

85. 것인가?

86. 4. 데이터에서

87. 어떤

88. 특징들을

89. 추출

90. 할

91. 것인가?

92. 5. 어떤

93. 척도를

94. 사용할

95. 것인가?

96. 분류기

97. 선택과

98. 관련된

99. 제약사항 1. 실행

100. 시간

101. •실제로

102. 의사

103. 결정을

104. 하기

105. 위해

106. 모형을

107. 사용하는

108. 시간

109. •그외

110. 시간)

111. 모형을

112. 업데이트

113. 하는데

114. 시간

115. 2. 데이터

116. 과학자의

117. 알고리즘에

118. 대한

119. 이해

120. 3. 해석

121. 가능성

122. •비즈니스를

123. 위해

124. 모형이

125. 해석

126. 가능해야

127. 함

128. 4. 확장성

129. •학습시간,

130. 평가

131. 시간,

132. 모형의

133. 저장

134. 공간

135. M6D

136. 로지스틱

137. 회귀

138. 사례

139. 연구 1. 과제:

140. 사용자

141. 수준에서의

142. 구매

143. 전환

144. 예측

145. •언제

146. 누가

147. 클릭할

148. 것인지를

149. 예상하는

150. 과제

151. 2. 분류기로

152. 로지스틱

153. 회귀를

154. 선택

155. •확장성이

156. 높고

157. 클릭

158. 여부와

159. 같은

160. 이진

161. 결과를

162. 클릭

163. 모형

164. 설계 1. URL을

165. 무작위

166. 문자열로

167. 해시

168. 2. 사용자들에

169. 대한

170. 정보를

171. 축척하고,

172. 벡터로

173. 만들어

174. 보관

175. 3. Ex)

176. u

177. =

178. ltfxyz,

179. 123,

180. sdqwe,

181. 13msgtg

182. 4. 데이터

183. 행렬

184. •열:

185. 전체

186. 사이트,

187. 행:

188. 전체

189. 사용자

190. •방문한

191. 사이트의

192. 경우

193. 1로

194. 표시.

195. •훈련

196. 데이트를

197. 위해

198. 분류명

199. 변수

200. 추가(클릭/클린

201. 안함)

202. 이전

203. 장과의

204. 비교 1. 분류의

205. 관점

206. •나이브

207. 베이즈를

208. 사용한

209. 스팸

210. 분류기와

211. 유사

212. 2. 선형

213. 회기와의

214. 차이점

215. •선형

216. 회귀:

217. 실제

218. 값을

219. 예측

220. •로지스틱

221. 회귀:

222. 실제

223. 값에

224. 대한

225. 확률을

226. 출력

227. 수학적

228. 배경 1. 로지스틱

229. 회귀는

230. 결과가

231. 확률

232. 값(0

233. ~

234. 1

235. 사이

236. 값)

237. •예측

238. 값을

239. 0

240. ~

241. 1

242. 사이

243. 값으로

244. 표현할

245. 방법이

246. 필요

247. 2. 역로짓함수:

248. 실수

249. 값을

250. [0,

251. 1]내로

252. 한정된

253. 단일

254. 값으로

255. 변환) The Underlying Math So far we’ve seen that the beauty of logistic regression is it outputs values bounded by 0 and 1; hence they can be directly interpreted as probabilities.Let’sgetintothemathbehinditabit.Youwantafunction that takes the data and transforms it into a single value bounded inside the closed interval 0,1 . For an example of a function bounded be‐ tween0and1,considertheinverse-logitfunctionshowninFigure5-2. P t =logit−1 t ≡ 1 1+e−t = et 1+et P t =logit−1 t ≡ 1 1+e−t = et 1+et Figure 5-2. The inverse-logit function Logit Versus Inverse-logit The logit function takes x values in the range 0,1 and transforms them to y values along the entire real line: logit p =log p 1− p =log p −log 1− p The inverse-logit does the reverse, and takes x values along the real The Underlying Math So far we’ve seen that the beauty of logistic regression is it outputs values bounded by 0 and 1; hence they can be directly interpreted as probabilities.Let’sgetintothemathbehinditabit.Youwantafunction that takes the data and transforms it into a single value bounded inside the closed interval 0,1 . For an example of a function bounded be‐ tween0and1,considertheinverse-logitfunctionshowninFigure5-2. P t =logit−1 t ≡ 1 1+e−t = et 1+et Figure 5-2. The inverse-logit function 로짓함수:

256. [0,1]

257. 사이

258. 값을

259. 전체

260. 실수

261. 범위로

262. 맵핑

263. 데이터

264. 모형화the denominator is large, which makes the function close to zero. So that’s the inverse-logit function, which you’ll use to begin deriving a logisticregressionmodel.Inordertomodelthedata,youneedtowork with a slightly more general function that expresses the relationship between the data and a probability of a click. Start by defining: P ci xi = logit−1 α+βτ xi ci * 1−logit−1 α+βτ xi 1−ci Here ci is the labels or classes (clicked or not), and xi is the vector of features for user i. Observe that ci can only be 1 or 0, which means that if ci =1, the second term cancels out and you have: P ci =1 xi = 1 1+e − α+β τ xi =logit−1 α+βτ xi And similarly, if ci =0, the first term cancels out and you have: P ci =0 xi =1−logit−1 α+βτ xi To make this a linear model in the outcomes ci, take the log of the odds ratio: 클릭

265. 및

266. 클릭

267. 안함에

268. 대한

269. 확률

270. 질량

271. 함수 the denominator is large, which makes the function close to zero. So that’s the inverse-logit function, which you’ll use to begin deriving a logisticregressionmodel.Inordertomodelthedata,youneedtowork with a slightly more general function that expresses the relationship between the data and a probability of a click. Start by defining: P ci xi = logit−1 α+βτ xi ci * 1−logit−1 α+βτ xi 1−ci Here ci is the labels or classes (clicked or not), and xi is the vector of features for user i. Observe that ci can only be 1 or 0, which means that if ci =1, the second term cancels out and you have: P ci =1 xi = 1 1+e − α+β τ xi =logit−1 α+βτ xi And similarly, if ci =0, the first term cancels out and you have: P ci =0 xi =1−logit−1 α+βτ xi To make this a linear model in the outcomes ci, take the log of the odds ratio: log P ci =1 xi / 1−P ci =1 xi =α+βτ xi . the denominator is large, which makes the function close to zero. So that’s the inverse-logit function, which you’ll use to begin deriving a logisticregressionmodel.Inordertomodelthedata,youneedtowork with a slightly more general function that expresses the relationship between the data and a probability of a click. Start by defining: P ci xi = logit−1 α+βτ xi ci * 1−logit−1 α+βτ xi 1−ci Here ci is the labels or classes (clicked or not), and xi is the vector of features for user i. Observe that ci can only be 1 or 0, which means that if ci =1, the second term cancels out and you have: P ci =1 xi = 1 1+e − α+β τ xi =logit−1 α+βτ xi And similarly, if ci =0, the first term cancels out and you have: P ci =0 xi =1−logit−1 α+βτ xi To make this a linear model in the outcomes ci, take the log of the odds ratio: log P ci =1 xi / 1−P ci =1 xi =α+βτ xi . Which can also be written as: that’s the inverse-logit function, which you’ll use to begin deriving a logisticregressionmodel.Inordertomodelthedata,youneedtowork with a slightly more general function that expresses the relationship between the data and a probability of a click. Start by defining: P ci xi = logit−1 α+βτ xi ci * 1−logit−1 α+βτ xi 1−ci Here ci is the labels or classes (clicked or not), and xi is the vector of features for user i. Observe that ci can only be 1 or 0, which means that if ci =1, the second term cancels out and you have: P ci =1 xi = 1 1+e − α+β τ xi =logit−1 α+βτ xi And similarly, if ci =0, the first term cancels out and you have: P ci =0 xi =1−logit−1 α+βτ xi To make this a linear model in the outcomes ci, take the log of the odds ratio: log P ci =1 xi / 1−P ci =1 xi =α+βτ xi . Which can also be written as: logit P ci =1 xi =α+βτ xi . P ci xi = logit α+β xi * 1−logit α+β xi Here ci is the labels or classes (clicked or not), and xi is the vector of features for user i. Observe that ci can only be 1 or 0, which means that if ci =1, the second term cancels out and you have: P ci =1 xi = 1 1+e − α+β τ xi =logit−1 α+βτ xi And similarly, if ci =0, the first term cancels out and you have: P ci =0 xi =1−logit−1 α+βτ xi To make this a linear model in the outcomes ci, take the log of the odds ratio: log P ci =1 xi / 1−P ci =1 xi =α+βτ xi . Which can also be written as: logit P ci =1 xi =α+βτ xi . If it feels to you that we went in a bit of a circle here (this last equation i가

272. 클릭할

273. 확률의

274. 로짓은

275. 특징들이

276. 선형

277. 함수로

278. 표현

279. 로지스틱

280. 회귀

281. 모형 ratio: log P ci =1 xi / 1−P ci =1 xi =α+βτ xi . Which can also be written as: logit P ci =1 xi =α+βτ xi . If it feels to you that we went in a bit of a circle here (this was also implied by earlier equations), it’s because we did. of this was to show you how to go back and forth betwe abilities and the linearity. So the logit of the probability that user i clicks on the sho modeled as a linear function of the features, which were t user i visited. This model is called the logistic regression The parameter α is what we call the base rate, or the u probabilityof“1”or“click”knowingnothingmoreabout 1.

282. 모수

283. 알파:

284. 기저율,

286. •어떤

287. 사용자가

288. 벡터

289. x에

290. 대해서

291. 알려진

292. 바가

293. 없을때

294. ‘1’일

295. 확률

296. •기저율외

297. 특별한

298. 정보가

299. 없다면

300. 알파만

301. 평균

302. 예측은

303. 알파로

304. 실행

305. 3.

306. 베타:

307. 로짓함수의

308. 기울기

309. •일반적으로

310. 각

311. 데이터

312. 포인트에서의

313. 특징의

314. 개수

315. 만큼의

316. 차원을

317. 가지는

318. 벡터

319. •벡터

320. 베타는

321. 어떤

322. 특징이

323. 광고를

324. 클릭할

325. 가능도를

326. 증가

327. 또는

328. 감소

329. 시키는

330. 정도를

331. 결정 feature vector xi. In the case of measuring the likelihood of an averag user clicking on an ad, the base rate would correspond to the click through rate, i.e., the tendency over all users to click on ads. This i typically on the order of 1%. Ifyouhadnoinformationaboutyourspecificsituationexceptthebas rate, the average prediction would be given by just α: P ci =1 = 1 1+e−α The variable β defines the slope of the logit function. Note that in general it’s a vector that is as long as the number of features you ar using for each data point. The vector β determines the extent to which certain features are markers for increased or decreased likelihood to click on an ad.

332. 알파

333. 및

334. 베타의

335. 추정 1.

336. 최적의

337. 알파와

338. 베타

339. 값을

340. 선택하기

341. 위해

342. 데이터를

343. 훈련

344. 시키는

345. 것이

346. 모형화에서의

347. 당면한

348. 목표

349. 2.

350. 최대

351. 가능도

352. 방법

353. •가능도:

354. 알려진

355. 결과(관측표본)에

356. 기초하여

357. 미지의

358. 매배

359. 변수(모수)의

360. 추정 에

361. 대한

362. 척도

363. •가능도

364. 함수:

365. 미지의

366. 모수

367. 세타라는

368. 변수에

369. 의존하는

370. 함수

371. •최대

372. 가능도

373. 방법:

374. 확률

375. 변수에서

376. 표집한

377. 값을

378. 토대로

379. 그

380. 확률

381. 변수의

382. 모수를

383. 구하는

384. 방법 rate, the average prediction would be given by just α: P ci =1 = 1 1+e−α The variable β defines the slope of the logit function. Note that in general it’s a vector that is as long as the number of features you are using for each data point. The vector β determines the extent to which certain features are markers for increased or decreased likelihood to click on an ad. Estimating α and β Your immediate modeling goal is to use the training data to find the best choices for α and β. In general you want to solve this with max‐ imum likelihood estimation and use a convex optimization algorithm becausethelikelihoodfunctionisconvex;youcan’tjustusederivatives and vector calculus like you did with linear regression because it’s a complicatedfunctionofyourdata,andinparticularthereisnoclosed- form solution. Denote by Θ the pair α,β . The likelihood function L is defined by: L Θ X1,X2,⋯,Xn =P X Θ =P X1 Θ ·⋯·P Xn Θ where you are assuming the data points Xi are independent, where i=1, . . .,n represent your n users. This independence assumption cor‐ responds to saying that the click behavior of any given user doesn’t affect the click behavior of all the other users—in this case, “click be‐ havior”means“probabilityofclicking.”It’sarelativelysafeassumption

385. 가능도를

386. 최대화

387. 하기

388. 위한

389. 방법 1. 뉴턴의

390. 방법(미적분학)

391. •전역

392. 최댓값을

393. 찾는

394. 수치적

395. 기법

396. 2. 확률적

397. 경사

398. 감소법

399. •한번에

400. 하나의

401. 관측값을

402. 사용하여

403. 경사를

404. 근사하는

405. 법

406. 평

407. 가 1. 순위를

408. 매기는

409. 상황을

410. 로지스틱

411. 회귀로

412. 모형화

413. •ROC

414. 곡선

415. 아래

416. 면적

417. •누적향상도

418. 곡선

419. 아래

420. 면적

421. 2. 분류

422. 목적으로

423. 로지스틱

424. 회귀를

425. 사용하는

426. 경우

427. •향상도,

428. 정확도,

429. 정밀도,

430. 재현율,

431. F-점수

432. 3. 확률

433. 밀도

434. 추정

435. •평균

436. 제곱

437. 오차,

438. 평균

439. 제곱

440. 오차의

441. 제곱근,

442. 평균

443. 절대

444. 오차

445. QA

446. References • Rachel

447. Schutt,

448. Cathy

449. O’Neil,

450. 데이터

451. 과학

452. 입문(윤영민,

453. 허선,

454. 전희주,

455. 김정일,

456. 류자현

457. 옮김).

458. 서울시

459. 마포구:

460. 한빛

461. 미디어,

462. 2014

463. • https://en.wikipedia.org/wiki/Logistic_regression

464. • https://ko.wikipedia.org/wiki/%EB%A1%9C%EC %A7%80%EC%8A%A4%ED%8B%B1_%ED%9A%8C%EA %B7%80

465. • https://ko.wikipedia.org/wiki/%EA%B0%80%EB%8A %A5%EB%8F%84

데이터 과학 입문 5장

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (20)

Similar to 데이터 과학 입문 5장

Similar to 데이터 과학 입문 5장 (20)

More from HyeonSeok Choi

More from HyeonSeok Choi (20)

Recently uploaded

Recently uploaded (20)

데이터 과학 입문 5장