About Me

Bioinformatician And Data Scientist.

Me Hubert is a person who wants to be a small addition to humanuty as a Bioinformatician, and a Data Scientist.

September 3rd, 1987
+82-10-7476-0340
kjhyug93@gmail.com
https://bioinfo-hubert.blogspot.com

SKILLS

I majored in Life Science. To improve Bioinformatics skills, I always enjoy studying Programming languages ( Python, Shell script, and so on ) and Statistics.

With the confidence that Machine Learning will play an important roll in all fields in the future, I also studying it steadily.

Python/Shell script
WGS/WES/Cancer Genome Analysis
Certificate of Clinical Diagnosis
Machine Learning
Statistics

WORKS

What I'm doing

NGS Analysis

Setting up and maintaining analysis pipelines of Human (WGS/WES) and Non-human (Re-seq) NGS data.

Cancer Panel Analysis

Somatic and Germline analysis of cancer target panel data for supporting clinical diagnosis.

Programming

Python and Shell script are main programming languages of me, and also be familiar with Linux.

Certification

Have experience in preparing various certification documents for CAP, NGS Clinical Lab(MFDS), CE-IVD, and so on.

Statistics

Basic statistical analysis and visualization of various data.

Knowledge Sharing

Sharing Knowledge with others through blog, study group, online lecture, seminar, and so on.

209,923

Rawdata Manipulation

1,167

Analysis Clinical Data
-->

EXPERIENCES

CAREER
Jan 2021 - Present
Senior Researcher (Team Manager)
Macrogen
(Seoul, Korea)

Management of Clinical data analysis team.

Apr 2017 - Dec 2020
Assistant Researcher
Macrogen
(Seoul, Korea)

Analysis of WGS/WES/Cancer panel data. Participate in obtaining certificates: CAP, KIGTE, NGS Clinical lab, and CE-IVDR in Clinical Cancer Genomics team.

Apr 2014 - Mar 2017
Researcher
Macrogen
(Seoul, Korea)

Preprocessing NGS data (Illumina, Pacbio), and Analysis non-human re-sequencing data in General NGS Analysis team.

Feb 2009 - Aug 2009
Long term Volunteer
Vitalise
(Southampton, UK)

Providing companionship and social support to guests with disabilities. Assisting staff to offer daily health care services to them.

Mar 2006 - Feb 2013
BSc in Life Science
Handong Global University
(Pohang, Korea)

Received Academic(20062nd, 20071st) and National Scholarships.(20101st, 20111st, 2nd) Served as a leader of volunteer circle, 'SOUL'.

CONTENTS

LATEST POSTS


Summary

 모델은 예측 성능이 높아야 유용하므로 근본적인 목적은 고품질의 모델을 만드는 것입니다. 따라서 알고리즘이 만들 모델의 평가 방법에 대해 알아보겠습니다.

    * 교차검증 모델 만들기 (11.1)
        - 모델을 훈련하고 어떤 성능 지표(정확도, 제곱 오차 등)를 사용하여 얼마나 잘 동작하는지 계산
        - training set에 한정해서 잘 동작하는 모델이 아니라 새로운 데이터에 대해서 잘 동작하길 기대
        - KFCV(k-fold cross-validation)을 사용하여 최종 성능을 산출

    기본 회귀 모델 만들기 (11.2)
        - 회귀 모델 평가는 결정계수($ R^{2} $값)을 사용
        - $ R^2 = 1 - \\frac{\\sum_{i} (y_{i}-\\hat{y}_{i})^2}{\\sum_{i} (y_{i}-\\bar{y}_{i})^2} $

    * 기본 분류 모델 만들기 (11.3)
        - 분류 모델의 성능을 측정하는 일반적인 방법은 랜덤한 추측보다 얼마나 더 나은지 비교하는 것

    이진 분류기의 예측 평가하기 (11.4)
        - 훈련된 분류 모델의 품질평가: sklearn의 cross_val_score 함수 사용하여 교차검증 수행할 때 scoring 매개변수에 성능 지표 중 하나 선택
            1) Accuracy(정확도): 

    * 희소한 데이터의 특성 줄이기 (9.5)
        - 희소 특성 행렬의 차원을 축소함: TSVD truncated singular value decomposition를 사용함
        - TSVD는 PCA와 비슷하지만 PCA와 달리 희소 특성 행렬에 사용할 수 있다는 장점을 가짐
        - 자연어 처리에서는 TSVD를 잠재 의미 분석(LSA latent semantic analysis)이라고도 부름


Code