[머신러닝] 8. 서포트 벡터 머신(feat. R Code)

장점

범주, 수치 예측 문제에 사용
오류 데이터 영향이 적음
과적합 되는 경우가 적음
신경망보다 사용하기 쉬움

단점

여러 조합의 테스트가 필요
학습 속도가 느림
해석이 어렵고 복잡한 블랙박스

R Code

서포트 벡터 분류기

set.seed(1)
x=matrix(rnorm(20*2), ncol=2)
y=c(rep(-1, 10), rep(1, 10))
x[y==1,]=x[y==1,]+1
plot(x, col=(3-y))

두 클래스에 속하는 관측치들을 생성한다.

library(e1071)

dat=data.frame(x=x, y=as.factor(y))
svmfit=svm(y~., data=dat, kernel='linear', cost=10, scale=FALSE)

plot(svmfit, dat)

svm()함수는 인자 kernel='linear'가 사용될 경우 서포트 벡터 분류기를 적합하는 데 사용될 수 있다.
cost 인자는 마진 위반에 대한 비용을 지정한다. cost인자가 작으면 마진이 넓을 것이고 많은 서포트 벡터들이 마진 상에 있거나 마진을 위반할 것이다.

svmfit$index
## [1]  1  2  5  7 14 16 17
summary(svmfit)
## Call:
## svm(formula = y ~ ., data = dat, kernel = "linear", cost = 10, 
##    scale = FALSE)
##
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  linear 
##        cost:  10 
##
## Number of Support Vectors:  7
##  ( 4 3 )
##
## Number of Classes:  2 
## Levels: 
##  -1 1

7개의 서포트 벡터가 있음을 볼 수 있다.
summary를 통해 선형 커널이 cost=10을 가지고 사용되었으며, 7개의 서포트 벡터가 있는데, 한 클래스에 4개 다른 클래스에 3개가 있음을 알 수 있다.

svmfit=svm(y~., data=dat, kernel='linear', cost=0.1, scale=FALSE)
plot(svmfit, dat)
svmfit$index
## [1]  1  2  3  4  5  7  9 10 12 13 14 15 16 17 18 20

더 작은 값은 cost 파라미터를 사용하면, 마진이 넓어지기 때문에 더 많은 수의 서포트 벡터를 얻는다.

set.seed(1)
tune.out<-tune(svm, y~., data=dat, kernel='linear',
	ranges=list(cost=c(0.001, 0.01, 0.1, 1, 5, 10, 100)))
summary(tune.out)
## Parameter tuning of ‘svm’:
##
## - sampling method: 10-fold cross validation 
##
## - best parameters:
##  cost
##   0.1
##
## - best performance: 0.05 
##
## - Detailed performance results:
##    cost error dispersion
## 1 1e-03  0.55  0.4377975
## 2 1e-02  0.55  0.4377975
## 3 1e-01  0.05  0.1581139
## 4 1e+00  0.15  0.2415229
## 5 5e+00  0.15  0.2415229
## 6 1e+01  0.15  0.2415229
## 7 1e+02  0.15  0.2415229

e1071라이브러리는 교차검증을 수행하기 위해 내장 함수 tune()을 포함한다. 기본적으로 tune()은 관심있는 모델 셋(집합)에 대해 10-fold 교차검증을 수행한다.
cost=0.1인 경우 교차검증 오차율이 가장 낮다는 것을 알 수 있다.

bestmod=tune.out$best.model
summary(bestmod)
## Call:
## best.tune(method = svm, train.x = y ~ ., data = dat, ranges = list(cost = c(0.001, 
##     0.01, 0.1, 1, 5, 10, 100)), kernel = "linear")
##
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  linear 
##       cost:  0.1 
##
## Number of Support Vectors:  16
##  ( 8 8 )
##
## Number of Classes:  2 
##
## Levels: 
##  -1 1

tune()함수는 얻어진 최고 모델을 저장하며, 이것은 다음과 같이 액세스 될 수 있다.

xtest=matrix(rnorm(20*2), ncol=2)
ytest=sample(c(-1, 1), 20, rep=TRUE)
xtest[ytest==1,]=xtest[ytest==1,]+1
testdat=data.frame(x=xtest, y=as.factor(ytest))

ypred=predict(bestmod, testdat)
table(predict=ypred, truth=testdat$y)
##        truth
## predict -1 1
##      -1  9 1
##      1   2 8

예측에는 교차검증을 통해 얻은 최고의 모델을 사용한다.

서포트 벡터 머신

set.seed(1)
x=matrix(rnorm(200*2), ncol=2)
x[1:100,]=x[1:100,]+2
x[101:150,]=x[101:150,]-2
y=c(rep(1,150), rep(2,50))
dat=data.frame(x=x, y=as.factor(y))
plot(x, col=y)

train=sample(200, 100)
svmfit=svm(y~., data=dat[train,], kernel='radial', gamma=1, cost=1)
plot(svmfit, dat[train,])

다항식 커널을 가지고 SVM을 적합하기 위해서는 kernel='polynomial'을 사용하고, 방사커널로 SVM을 적합하는 데는 kernel='radial'을 사용한다. 전자의 경우 degree 인자도 사용하여 다항식 커널에 대한 차수를 지정하고, 후자의 경우에는 gamma를 사용하여 방사기저커널에 대한 gamma값을 지정한다.
데이터를 그래프로 나타내어 보면 클래스 경계가 비선형이라는 것을 명백히 알 수 있다.

svmfit=svm(y~., data=dat[train,], kernel='radial', gamma=1, cost=1e5)
plot(svmfit, dat[train,])

그림을 보면 이 SVM적합에는 상당한 수의 훈련오차가 있음을 알 수 있다. cost값을 증가시키면 훈련오차의 수를 줄일 수 있지만, 이것은 데이터를 과대적합할 위험이 있는 더 불규칙한 결정경계를 초래한다.

set.seed(1)
tune.out=tune(svm, y~., data=dat[train,], kernel='radial',
	ranges=list(cost=c(0.1, 1, 10, 100, 1000),
	gamma=c(0.5, 1, 2, 3, 4)))
summary(tune.out)

table(true=dat[-train,'y'], pred=predict(tune.out$best.model, newdata=dat[-train,]))

ROC 곡선

library(ROCR)
rocplot<-function(pred, truth, ...){
	predob=prediction(pred, truth)
	perf=performance(predob, 'tpr', 'fpr')
	plot(perf, ...)}

svmfit.opt<-svm(y~., data=dat[train,], kernel='radial',
	gamma=2, cost=1, decision.values=T)
fitted<-attributes(predict(svmfit.opt, dat[train,], decision.values=TRUE))$decision.values
rocplot(fitted, dat[train,'y'], main='Training Data')

svmfit.flex<-svm(y~., data=dat[train,], kernel='radial',
	gamma=50, cost=1, decision.values=T)
fitted<-attributes(predict(svmfit.flex, dat[train,], decision.values=T))$decision.values
rocplot(fitted, dat[train,'y'], add=T, col='red')

'머신러닝, 딥러닝' 카테고리의 다른 글

[머신러닝] 7. 트리 기반의 방법- 배깅, 랜덤포레스트, 부스팅 (0)	2021.09.11
[R] 사용되지 않는 level 제거하기 (feat. drop.levels함수) (0)	2021.05.10
[머신러닝] 6. 선형성을 넘어서(비선형모델)- 다항식회귀, 계단함수, 스플라인, GAMs (feat. R Code) (0)	2021.04.26
[머신러닝] 5. 선형모델 선택 및 Regularization- Ridge, Lasso regression, PCR, PLS (feat. R Code) (0)	2021.04.25
[머신러닝] 3. 분류- 로지스틱 회귀, LDA, QDA (feat. R Code) (0)	2021.04.12

공부하고 기록하는 블로그

[머신러닝] 8. 서포트 벡터 머신(feat. R Code)

R Code

'머신러닝, 딥러닝' 카테고리의 다른 글

티스토리툴바

[머신러닝] 8. 서포트 벡터 머신(feat. R Code)

R Code

'머신러닝, 딥러닝' 카테고리의 다른 글

'머신러닝, 딥러닝' Related Articles

티스토리툴바