Machine Learning | Cong Peng

LinkedInfo.co

The Web should be an open web, all the informations published on the Web are meant to be shared. Linkedinfo.co is a Web service uses Semantic Web technologies and Machine Learning to link and share technical articles on the Web.

Text Analysis Service

A text analysis Web service for technical articles includes topic identification and keywords extraction. The model of topic identification uses a pre-trained BERT and fine-tuned on the dataset of LinkedInfo.co and questions on Stack Overflow.

XGBoost.swift

The first Swift interface for XGBoost, which is an optimized distributed gradient boosting library implements Machine Learning algorithms under Gradient Boosting framework.

Using BERT to perform Topic Tag Prediction for Technical Articles

Introduction This is a follow up post of Multi-label classification to predict topic tags of technical articles from LinkedInfo.co. We will continute the same task by using BERT. Firstly we’ll just use the embeddings from BERT, and then feed them to the same classification model used in the last post, SVM with linear kenel. The reason of keep using SVM is that the size of the dataset is quite small.

A Walk Through of the IEEE-CIS Fraud Detection Challenge

Introduction This is a brief walk through of the Kaggle challenge IEEE-CIS Fraud Detection. The process in this post is not meant to compete the top solution by performing an extre feature engineering and a greedy search for the best model with hyper-parameters. This is just to walk through the problem and demonstrate a relatively good solution, by doing feature analysis and a few experiments with reference to other’s methods.

Skin Lesion Classifier

A skin lesion classifier that uses a deep neural network trained on the HAM10000 dataset. An implementation of the ISIC challenge 2018 task 3.

Skin Lesion Image Classification with Deep Convolutional Neural Networks

Introduction In this post we will show how to do skin lesion image classification with deep neural networks. It is an image classifier trained on the HAM10000 dataset, the same problem in the International Skin Imaging Collaboration (ISIC) 2018 challenge task3. The solution in this post is mainly based on some web posts and methods from the ISIC2018 leadboard. The classification neural network model is tested with pretrained ResNet and DenseNet and implemented with PyTOrch.

Multi-label classification to predict topic tags of technical articles from LinkedInfo.co

This code snippet is to predict topic tags based on the text of an article. Each article could have 1 or more tags (usually have at least 1 tag), and the tags are not mutually exclusive. So this is a multi-label classification problem. It's different from multi-class classification, the classes in multi-class classification are mutually exclusive, i.e., each item belongs to 1 and only 1 class. In this snippet, we will use OneVsRestClassifier (the One-Vs-the-Rest) in scikit-learn to process the multi-label classification.

Explore the house prices kaggle competition

Thanks to [pmarcelino](https://www.kaggle.com/pmarcelino /comprehensive-data-exploration-with-python) and serigne for their great work. This is my second kaggle competition to practice on the knowledge of data analysis and machine learning. Unlike the Titanic competition, this house prices is a regression problem. So there will be much difference from the previous binary classification. For this competition, we will have 79 variables that describe various aspects of a house and with a price in the training data set.