用Yellowbrick分析文本数据

你将学到什么

Use visual diagnostic tools from Yellowbrick to steer your machine learning workflow

Vectorize text data using TF-IDF

Cluster documents using embedding techniques and appropriate metrics

课程概况

Welcome to this project-based course on Analyzing Text Data with Yellowbrick. Tasks such as assessing document similarity, topic modelling and other text mining endeavors are predicated on the notion of “closeness” or “similarity” between documents. In this course, we define various distance metrics (e.g. Euclidean, Hamming, Cosine, Manhattan, etc) and understand their merits and shortcomings as they relate to document similarity. We will apply these metrics on documents within a specific corpus and visualize our results. By the end of this course, you will be able to confidently use visual diagnostic tools from Yellowbrick to steer your machine learning workflow, vectorize text data using TF-IDF, and cluster documents using embedding techniques and appropriate metrics.

This course runs on Coursera’s hands-on project platform called Rhyme. On Rhyme, you do projects in a hands-on manner in your browser. You will get instant access to pre-configured cloud desktops containing all of the software and data you need for the project. Everything is already set up directly in your internet browser so you can just focus on learning. For this project, you’ll get instant access to a cloud desktop with Python, Jupyter, Yellowbrick, and scikit-learn pre-installed.

Notes:
– You will be able to access the cloud desktop 5 times. However, you will be able to access instructions videos as many times as you want.
– This course works best for learners who are based in the North America region. We’re currently working on providing the same experience in other regions.

课程大纲

Project: Analyze Text Data with Yellowbrick

课程项目

Introduction and Loading the Corpus

Vectorizing the Documents

Clustering Similar Documents with Squared Euclidean Distance And Euclidean Distance

Manhattan (aka “Taxicab” or “City Block”) Distance

Bray Curtis Dissimilarity and Canberra Distance

Cosine Distance

What Metrics Not to Use

Omitting Class Labels - Using KMeans Clustering

Analyze Text Data with Yellowbrick

你将学到什么

课程概况

课程大纲

课程项目

同类课程

IBM 人工智能工程专业证书

人工智能工作流程：企业模型部署

基于Google云平台的TensorFlow无服务器机器学习 – 法语版

Google云平台大数据与机器学习基础

这些课程也不错哦

一天学会DevOps 自动化测试及部署

无痛起步，用WordPress打造自己的网路商店

NodeJS 前后端开发实战

精通Bootstrap 4 – 开发超强不解释

Logic Pro X 音乐制作

盖伊‧川崎的创业教练课 (简体中文)

职业生涯品牌管理

Excel 办公技能

声明：MOOC中国十分重视知识产权问题，我们发布之课程均源自下列机构，版权均归其所有，本站仅作报道收录并尊重其著作权益。感谢他们对MOOC事业做出的贡献！

© 2008-2022 CMOOC.COM 慕课改变你，你改变世界