Hubert

	September 3^rd, 1987
	+82-10-7476-0340
	kjhyug93@gmail.com
	https://bioinfo-hubert.blogspot.com

SKILLS

I majored in Life Science. To improve Bioinformatics skills, I always enjoy studying Programming languages ( Python, Shell script, and so on ) and Statistics.

With the confidence that Machine Learning will play an important roll in all fields in the future, I also studying it steadily.

Python/Shell script

WGS/WES/Cancer Genome Analysis

Certificate of Clinical Diagnosis

Machine Learning

Statistics

WORKS

What I'm doing

NGS Analysis

Setting up and maintaining analysis pipelines of Human (WGS/WES) and Non-human (Re-seq) NGS data.

Cancer Panel Analysis

Somatic and Germline analysis of cancer target panel data for supporting clinical diagnosis.

Programming

Python and Shell script are main programming languages of me, and also be familiar with Linux.

Certification

Have experience in preparing various certification documents for CAP, NGS Clinical Lab(MFDS), CE-IVD, and so on.

Statistics

Basic statistical analysis and visualization of various data.

Knowledge Sharing

Sharing Knowledge with others through blog, study group, online lecture, seminar, and so on.

EXPERIENCES

CAREER

Jan 2021 - Present

Senior Researcher (Team Manager)

Macrogen

(Seoul, Korea)

Management of Clinical data analysis team.

Apr 2017 - Dec 2020

Assistant Researcher

Macrogen

(Seoul, Korea)

Analysis of WGS/WES/Cancer panel data. Participate in obtaining certificates: CAP, KIGTE, NGS Clinical lab, and CE-IVDR in Clinical Cancer Genomics team.

Apr 2014 - Mar 2017

Researcher

Macrogen

(Seoul, Korea)

Preprocessing NGS data (Illumina, Pacbio), and Analysis non-human re-sequencing data in General NGS Analysis team.

Feb 2009 - Aug 2009

Long term Volunteer

Vitalise

(Southampton, UK)

Providing companionship and social support to guests with disabilities. Assisting staff to offer daily health care services to them.

Mar 2006 - Feb 2013

BSc in Life Science

Handong Global University

(Pohang, Korea)

Received Academic(2006^2nd, 2007^1st) and National Scholarships.(2010^1st, 2011^{1st, 2nd}) Served as a leader of volunteer circle, 'SOUL'.

LATEST POSTS

Summary

Code

Summary

모델은 예측 성능이 높아야 유용하므로 근본적인 목적은 고품질의 모델을 만드는 것입니다. 따라서 알고리즘이 만들 모델의 평가 방법에 대해 알아보겠습니다.

* 교차검증 모델 만들기 (11.1)

- 모델을 훈련하고 어떤 성능 지표(정확도, 제곱 오차 등)를 사용하여 얼마나 잘 동작하는지 계산

- training set에 한정해서 잘 동작하는 모델이 아니라 새로운 데이터에 대해서 잘 동작하길 기대

- KFCV(k-fold cross-validation)을 사용하여 최종 성능을 산출

* 기본 회귀 모델 만들기 (11.2)

- 회귀 모델 평가는 결정계수($ R^{2} $값)을 사용

- $ R^2 = 1 - \\frac{\\sum_{i} (y_{i}-\\hat{y}_{i})^2}{\\sum_{i} (y_{i}-\\bar{y}_{i})^2} $

* 기본 분류 모델 만들기 (11.3)

- 분류 모델의 성능을 측정하는 일반적인 방법은 랜덤한 추측보다 얼마나 더 나은지 비교하는 것

* 이진 분류기의 예측 평가하기 (11.4)

- 훈련된 분류 모델의 품질평가: sklearn의 cross_val_score 함수 사용하여 교차검증 수행할 때 scoring 매개변수에 성능 지표 중 하나 선택

1) Accuracy(정확도):

* 희소한 데이터의 특성 줄이기 (9.5)

- 희소 특성 행렬의 차원을 축소함: TSVD^{truncated singular value decomposition}를 사용함

- TSVD는 PCA와 비슷하지만 PCA와 달리 희소 특성 행렬에 사용할 수 있다는 장점을 가짐

- 자연어 처리에서는 TSVD를 잠재 의미 분석(LSA^{latent semantic analysis})이라고도 부름

HI! I'M HUBERT

About Me

Bioinformatician And Data Scientist.

SKILLS

Python/Shell script

WGS/WES/Cancer Genome Analysis

Certificate of Clinical Diagnosis

Machine Learning

Statistics

WORKS

What I'm doing

NGS Analysis

Cancer Panel Analysis

Programming

Certification

Statistics

Knowledge Sharing

209,923

Rawdata Manipulation

1,167

Analysis Clinical Data

EXPERIENCES

CAREER

Senior Researcher (Team Manager)

Macrogen

(Seoul, Korea)

Assistant Researcher

Macrogen

(Seoul, Korea)

Researcher

Macrogen

(Seoul, Korea)

Long term Volunteer

Vitalise

(Southampton, UK)

BSc in Life Science

Handong Global University

(Pohang, Korea)

CONTENTS

LATEST POSTS

Summary

Code