你将学到什么
Apply data cleaning basics to make data "tidy"
Obtain usable data from the web, APIs, and databases
Understand common data storage systems
Use R for text and date manipulation
课程概况
处理数据之前,我们得先得到数据。本套课程的主要内容包括:获得数据的基本方法,如网站、API、数据库以及同行等多种方式,数据清理和数据整理的基本知识,数据整理可以大幅提升下游数据分析任务的速度;还包括完整数据集的组成部分,其中包括原始数据、处理指令、码本/码书和经过处理的数据;以及学习数据获取、清理和分享所需的基础知识。
Before you can work with data you have to get some. This course will cover the basic ways that data can be obtained. The course will cover obtaining data from the web, from APIs, from databases and from colleagues in various formats. It will also cover the basics of data cleaning and how to make data “tidy”. Tidy data dramatically speed downstream data analysis tasks. The course will also cover the components of a complete data set including raw data, processing instructions, codebooks, and processed data. The course will cover the basics needed for collecting, cleaning, and sharing data.
课程大纲
周1
完成时间为 2 小时
Week 1
In this first week of the course, we look at finding data and reading different file types.
9 个视频 (总计 67 分钟), 4 个阅读材料, 1 个测验
周2
完成时间为 1 小时
Week 2
Welcome to Week 2 of Getting and Cleaning Data! The primary goal is to introduce you to the most common data storage systems and the
appropriate tools to extract data from web or from databases like MySQL.
5 个视频 (总计 41 分钟), 1 个测验
周3
完成时间为 10 小时
Week 3
Welcome to Week 3 of Getting and Cleaning Data! This week the lectures will focus on organizing, merging and managing the data you have
collected using the lectures from Weeks 1 and 2.
7 个视频 (总计 60 分钟), 1 个阅读材料, 4 个测验
周4
完成时间为 6 小时
Week 4
Welcome to Week 4 of Getting and Cleaning Data! This week we finish up with lectures on text and date manipulation in R. In this final week we
will also focus on peer grading of Course Projects.