Face recognition

FaceNet: A Unified Embedding for Face
Recognition and Clustering
Florian Schroff, Dmitry Kalenichenko, James Philbin
(Submitted on 12 Mar 2015 (v1), last revised 17 Jun 2015 (this version, v3))

Introduction
얼굴 인식 분야에서의 최근의 중요한 진보에도 불구하고, 얼굴 인식 및 인식을 효율적으로 구현하는 것은 현재의 접근법에
도전이되고있다.
이 시스템은 얼굴 이미지를 얼굴 유사도의 척도와 직접적으로 대응하는 좁은 유클리드 공간으로의 매핑을 직접 학습한다.
이 공간이 생성되면 얼굴 인식, 검증 및 클러스터링과 같은 작업을 FaceNet 특징 벡터로 사용하는 표준 기술을 사용하여 쉽
게 구현할 수 있습니다.
deep convolutional network를 사용하여 이전의 심층 학습 접근법처럼 중간 병목 계층이 아닌 직접 임베딩 자체를 최적화합
니다.
훈련을 위해 우리는 참신한 온라인 삼중 항 마이닝 방법을 사용하여 생성 된 대략적으로 정렬 된 매칭 / 비 매칭 얼굴 패치의
세 쌍을 사용합니다.
우리의 접근 방식의 이점은 훨씬 더 표현 효율입니다. 얼굴 당 128 바이트 만 사용하여 최신의 얼굴 인식 성능을 얻습니다.
Wild (LFW) 데이터 세트에서 널리 사용되는 Labeled Faces에서 우리 시스템은 99.63 %의 새로운 기록 정확도를 달성합니다.
YouTube Faces DB에서 95.12 %를 달성했습니다.

Face Recognition Flow
1. face detection
2. face rotation,
face frontalization,
face normalization
3. face feature extraction
4. face classifier

1. face detection
1-1. dlib
- 68개의 랜드마크를 이용하여 얼굴을 추출할 수 있다.
- 장점 : 장점으로는 drip으로 detection을 수행할 경우 face rotation & frontalization
을 수행하여 얼굴 예측에 있어 높은 정합성을 가져갈 수 있다.
- 단점 : 이방법의 경우 속도가 빠르지 않아 실시간으로 빠른 응답을 요할 경우
적합하지 않다.

model_dir = '/test/'
down_land68_url = 'http://dlib.net/files/shape_predictor_68_face_landmarks.dat.bz2'
land68_file = 'shape_predictor_68_face_landmarks.dat.bz2'
# down load file...
predictor = dlib.shape_predictor(model_dir + land68_file.replace('.bz2', ''))
detector = dlib.get_frontal_face_detector()
fa = FaceAligner(predictor, desiredFaceWidth=image_size)
frame = misc.imread(test_file)
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
bounding_boxes = detector(gray , 2)
for bounding_box in bounding_boxes:
bounding_boxes = self.detector(gray, 2)
det = rect_to_bb(bounding_box)
bb = np.zeros(4, dtype=np.int32)
bb[0] = np.maximum(det[0] - self.margin / 2, 0)
bb[1] = np.maximum(det[1] - self.margin / 2, 0)
bb[2] = np.minimum(det[2] + self.margin / 2, img_size[1])
bb[3] = np.minimum(det[3] + self.margin / 2, img_size[0])
bb[2] += bb[0]
bb[3] += bb[1]
rect = dlib.rectangle(left=int(bounding_boxes[0][0]), top=int(bounding_boxes[0][1])
, right=int(bounding_boxes[0][2]), bottom=int(bounding_boxes[0][3]))
plt.imshow(frame)
plt.show()
1. face detection

1. face detection
1-2. hog(Histogram of Oriented Gradients)
- 얼굴의 도드라 지는 부분에 대한 얼굴 피쳐를 가지고 유사한 유형의 피쳐를 가진 부분을 추출
- 장점: 2005년에 개발 되었으며 빠르고 안정적으로 얼굴을 추출할 수 있다.
- 단점: 속도는 빠르나 MTCNN보다 정합성이 떨어짐.
import face_recognition
bounding_boxes = face_recognition.face_locations(frame,
number_of_times_to_upsample=0, model='hog')

1. face detection
1-3. cnn
- 얼굴의 도드라 지는 부분에 대한 얼굴 피쳐를 가지고 유사한 유형의 피쳐를 가진 부분을 추출
- 장점: 빠른 속도로 얼굴 추출을 할 수 있다.
- 단점: 추출 영역이 불규칙하게 많은 변화를 일으킨다.
import face_recognition
bounding_boxes = face_recognition.face_locations(frame,
number_of_times_to_upsample=0, model='cnn')

1. face detection
1-4. mtcnn
- Multi-task Cascaded Convolutional Networks로 3단계의 컨볼루션 네트워크 설계 모델을 사용하여
얼굴을 추출한다.(pnet, rnet, onet)
- 장점: 얼굴 추출에 높은 정합성을 가진다.
- 단점: 속도가 영상 처리에 빠르지 않은 편이다.

1. face detection
1-4. mtcnn
- # facenet 에서 detect_face 를 import 한다.
threshold = [0.6, 0.7, 0.7] # three steps's threshold
factor = 0.9 # scale factor 0.709
with tf.Session(config=config) as sess:
pnet, rnet, onet = detect_face.create_mtcnn(sess, None)
bounding_boxes, _ = detect_face.detect_face(frame, minsize, pnet, rnet, onet, threshold, factor)

1. face detection
1-5. Result.
- 속도 부분이 크게 문제가 되지 않을 정도의 경우 MTCNN이 가장 높은 정합성을 보였다.
하지만 실시간 얼굴 검출에 있어 빠른 속도에 어느정도의 정합성을 충족시키는 것은 Hog방식을
사용하는게 좋아 보였다.

2. face normalization
- 같은 얼굴의 다양한 얼굴에 대한 정규화를 통해 정합성 향상을 시킨다.
cropped = frame[boxes[0][1]:boxes[0][3], boxes[0][0]:boxes[0][2], :]
aligned = misc.imresize(cropped, (self.image_size, self.image_size), interp='bilinear')
prewhitened = facenet.prewhiten(aligned)
prewhitened_reshape = prewhitened.reshape(-1, self.image_size, self.image_size, 3)

2. face rotation
- 눈을 중심으로 수평이 되게 얼굴의 Boxex 좌표를 이동하여 얼굴을 돌려 준다.
predictor = dlib.shape_predictor(model_dir + land68_file.replace('.bz2', ''))
fa = FaceAligner(predictor, desiredFaceWidth=image_size)
aligned = fa.align(fram, gray, rect)[:160,:160,:]

2. face frontalization
- 얼굴을 앞을 바라보게끔 변경해 준다.

- 기존 학습된 모델을 불러와 새로운 이미지의 얼굴 128개의 특징을 추출해 주어야 한다.

- Inception Resnet-v1으로 기 학습된 모델을 불러와 이미지의 128개 특징을 추출해 준다.
해당 모델은 Softmax로 분류하는 방식을 사용하였으며 현재 Center Loss방식을 이용하여
Feature Model을 학습하는 방식이 가장 높은 정확성을 보이고 있다.
Currently, the best results are achieved by training the model as a classifier with the addition of Center loss.
Details on how to train a model as a classifier can be found on the page Classifier training of Inception-ResNet-v1.

- import facenet
facenet.load_model('pre_train_model_path')
images_placeholder = tf.get_default_graph().get_tensor_by_name("input:0")
phase_train_placeholder = tf.get_default_graph().get_tensor_by_name("phase_train:0")
embeddings = tf.get_default_graph().get_tensor_by_name("embeddings:0")
feed_dict = {images_placeholder: prewhitened_reshape, phase_train_placeholder: False}
emb = sess.run(self.embeddings, feed_dict=feed_dict)

4. face feature classifier
4-1. SVM을 통하여 직접 분류 하는 경우
svm으로 학습을 시킬때 카테고리 - 라벨명 을 1:1 로 학습을 시켜준다.
x값에는 이미지의 Feature를 추출한 128개 값을 주고 y의 값으로 라벨명을 넣어준다.
이렇게 모델을 만들게 되면 분류시에 새로 분류할 이미지를 모델에 직접 넣어
카테고리를 리턴 받아 분류를 할 수 있다.
장점: 판별하고자 하는 갤러리에 있는 사람으로 모델이 오버피팅 되기 때문에
정합성이 매우 높은 모델을 학습 시킬수 있다.
단점: 카테고리별 많은 이미지가 필요하다. (약 300장)

4-1. SVM을 통하여 직접 분류 하는 경우
emb_array[0] = [1,3,5,..] #128 ndarray
labels[0] = [image0]
model = SVC(kernel='linear', probability=True)
emb_array, labels = utils.get_images_labels_pair(emb_array, labels, dataset)
model = model.fit(emb_array, labels)

4-2. 단순 이미지 distance만으로 판단을 해준다.
갤러리 이미지와 예측할 이미지를 Substract 하여 단순 거리가 0에 가까운 것을
정합성이 높은 카테고리로 추천을 해준다.
빠른 속도로 값을 예측 할 수 있지만 정합성 부분에서 높지 않은 결과가 나왔다.
# Print distance matrix
print('Distance matrix')
print(' ', end='')
for i in range(nrof_images):
print(' %1d ' % i, end='')
print('')
for i in range(nrof_images):
print('%1d ' % i, end='')
for j in range(nrof_images):
dist = np.sqrt(np.sum(np.square(np.subtract(emb[i,:], emb[j,:]))))
print(' %1.4f ' % dist, end='')
print('')

4-3. Same or Difference 방법을 통한 분류
학습을 시킬때 총 3장의 이미지를 준비하여 학습을 시키게 된다.
1번 이미지는 anchor라 하여 기준이 되는 이미지 이며
2번 이미지는 1번 이미지와 같은 사람의 이미지 이며
3번 이미지는 1번 이미지와 다른 사람의 이미지를 준비한다
1번 사진 128특징 * 2번 사진 128특징을 같은 이미지 라벨링을 하고
1번 사진 128특징 * 3번 사진 128특징을 다른 이미지 라벨링을 한다.
이와 같이 학습을 시키게 되면 2가지 128 특증을 가진 모델을 만들수 있으며 이미지가
같은 이미지 아닌지를 판단하여 분류 할 수 있게 된다.

4-3. Same or Difference 방법을 통한 분류
4-3-1. 나만의 갤러리를 만들어 분류를 하고자 하는 실제 사람을
카테고리 별로 나누어 놓는다. 이 것들은 갤러리라고 하며 이 갤러리에 대한
얼굴을 추출해 특징을 뽑은 다음 numpy 파일 형태로 저장을 한다.
UnKnown을 하기 위해 Background Gallery를 구성해 놓는다.
4-3-2. 2장의 이미지가 같은지 다른 지를 판단해 주는 모델을 만든다.
SVM을 통해 모델을 만들었으나 속도적인 문제로 인해
단층 Layer를 가진 CNN모델로 학습을 하는 것이 효과적이다.
Pair

# Model Load
classifier_filename_exp = os.path.expanduser(self.classifier_filename)
with open(classifier_filename_exp, 'rb') as infile:
(self.model, self.class_names) = pickle.load(infile)
print('load classifier file-> %s' % classifier_filename_exp)
print('')
# ... Gallery & video image feature extraction
embv = self.emb_array * emb
dist = []
predictions = self.model.predict_proba(embv)
best_class_indices = [self.emb_labels[np.argmax(predictions, axis=0)[0]]]
best_class_probabilities = np.amax(predictions, axis=0)
for pcnt in predictions[:, 0].argsort()[::-1]:
if self.prediction_cnt > log_cnt and predictions[pcnt][0] < 1 and predictions[pcnt][
0] > self.prediction_max:
parray.append(str(predictions[pcnt][0])[:7] + '_' + self.class_names[self.emb_labels[pcnt]])
log_cnt += 1

Face recognition

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Similar to Face recognition

Similar to Face recognition (20)

Face recognition