课程概况
The course extends the fundamental tools in “Machine Learning Foundations” to powerful and practical models by three directions, which includes embedding numerous features, combining predictive features, and distilling hidden features. [这门课将先前「机器学习基石」课程中所学的基础工具往三个方向延伸为强大而实用的工具。这三个方向包括嵌入大量的特徵、融合预测性的特徵、与萃取潜藏的特徵。]
课程大纲
第一讲:Linear Support Vector Machine
more robust linear classification solvable with quadratic programming
第二讲:Dual Support Vector Machine
another QP form of SVM with valuable geometric messages and almost no dependence on the dimension of transformation
第三讲:Kernel Support Vector Machine
kernel as a shortcut to (transform + inner product): allowing a spectrum of models ranging from simple linear ones to infinite dimensional ones with margin control
第四讲:Soft-Margin Support Vector Machine
a new primal formulation that allows some penalized margin violations, which is equivalent to a dual formulation with upper-bounded variables
第五讲:Kernel Logistic Regression
soft-classification by an SVM-like sparse model using two-level learning, or by a "kernelized" logistic regression model using representer theorem
第六讲:Support Vector Regression
kernel ridge regression via ridge regression + representer theorem, or support vector regression via regularized tube error + Lagrange dual
第七讲:Blending and Bagging
blending known diverse hypotheses uniformly, linearly, or even non-linearly; obtaining diverse hypotheses from bootstrapped data
第八讲:Adaptive Boosting
"optimal" re-weighting for diverse hypotheses and adaptive linear aggregation to boost weak algorithms
第九讲:Decision Tree
recursive branching (purification) for conditional aggregation of simple hypotheses
第十讲:Random Forest
bootstrap aggregation of randomized decision trees with automatic validation
第十一讲:Gradient Boosted Decision Tree
aggregating trees from functional + steepest gradient descent subject to any error measure
第十二讲:Neural Network
automatic feature extraction from layers of neurons with the back-propagation technique for stochastic gradient descent
第十三讲:Deep Learning
an early and simple deep learning model that pre-trains with denoising autoencoder and fine-tunes with back-propagation
第十四讲:Radial Basis Function Network
linear aggregation of distance-based similarities to prototypes found by clustering
第十五讲:Matrix Factorization
linear models of items on extracted user features (or vice versa) jointly optimized with stochastic gradient descent for recommender systems
第十六讲:Finale
summary from the angles of feature exploitation, error optimization, and overfitting elimination towards practical use cases of machine learning