大数据 MasterTrack™ 证书

你将学到什么

Develop exploratory data analysis and visualization tools using Python and Jupyter notebooks

Apply design principles for a variety of statistical graphics and visualizations including scatterplots, line charts, histograms, and choropleth maps

Apply common data mining algorithms to discover relationships and patterns in large datasets

Implement more advanced learning algorithms such as deep learning and reinforcement learning

Perform scalable data processing operations in cloud computing environments

课程概况

Learn to navigate large, complex datasets through interactive exploration.

With zettabytes of data being collected annually, governments, companies, and people have more access to data than ever before. With so much data, it can be hard to know where to start looking for important insights or trends to drive business decisions.

Data mining techniques provide the first level of abstraction to raw data by extracting patterns, making big data analytics tools increasingly critical for providing meaningful information to inform better business decisions, and applying statistical learning theory to find a predictive function based on data.

You’ll learn to apply mathematical theory and decision making techniques that are vital to big data analysis, classification, clustering, and association rule mining through real-world projects designed by faculty from Arizona State University.

By committing to online study for 4-6 months, you can earn the Big Data MasterTrack Certificate that will be a pathway to the online Master of Computer Science degree at Arizona State University.

包含课程

CSE 511 Data Processing at Scale

Database systems are used to provide convenient access to disk-resident data through efficient query processing, indexing structures, concurrency control, and recovery. This course delves into new frameworks for processing and generating large-scale datasets with parallel and distributed algorithms, covering the design, deployment and use of state-of-the-art data processing systems, which provide scalable access to data.

Specific topics covered include:

• Efficient query processing
• Indexing structures
• Distributed database design
• Parallel query execution
• Concurrency control in distributed parallel database systems
• Data management in cloud computing environments
• Data management in Map/Reduce-based
• NoSQL database systems

Learners completing this course will be able to:

• Perform queries (e.g., SQL) and analytics tasks in state-of-the-art database systems
• Apply leading-edge techniques to design/tune distributed and parallel database systems
• Utilize existing NoSQL database systems as appropriate for specified cases
• Perform database operations (e.g., selection, projection, join, and groupby) in state-of-the-art cluster computing systems such as Hadoop/Spark
• Perform scalable data processing operations (e.g., selection, projection, join, and groupby) in cloud computing environments, including Amazon AWS

CSE 572 Data Mining

Once called “knowledge discovery in databases,” advances in processing power and speed over the last decade have allowed users to move beyond manual, tedious, and time-consuming practices to quick, easy data analysis that harnesses the power of machine learning and high performance computing. This course will introduce you to the fundamentals of data mining and pattern recognition. You will gain a deeper understanding of data through hands-on experience in the topic areas of big data analysis, classification, clustering, and association rule mining. Advanced topics such as reinforcement learning, deep learning, transfer learning and Deep Mind for Google will also be covered. By the end of the course, you will be able to apply state of the art data mining technology to real world applications, analyze and compare competing techniques, and design optimal solutions for a given set of application driven constraints.

Specific topics covered include:

• Data Mining Fundamentals
• Machine Learning
• Data Collection
• Deep Learning
• Data Visualization
• Reinforcement Learning
• Data Mining Algorithms

Learners completing this course will be able to:

• Differentiate among major data mining techniques such as classification, cluster analysis, and association rule mining
• Apply common data mining algorithms to discover relationships and patterns in large datasets
• Implement more advanced learning algorithms such as deep learning and reinforcement learning
• Utilize open source tools to build a data mining project to solve a specific problem

CSE 575 Statistical Machine Learning

The link between inference and computation is central to statistical machine learning, which combines the computational sciences with statistics. In addition to artificial intelligence, fields such as information management, finance, bioinformatics, and communications are significantly influenced by developments in statistical machine learning. This course investigates the data mining and statistical pattern recognition that support artificial intelligence. Main topics covered include supervised learning; unsupervised learning; and deep learning, including major components of machine learning and the data analytics that enable it.

Specific topics covered include:

• Probability distributions
• Maximum likelihood estimation
• Naive Bayes
• Logistic regression
• Support vector machines
• Clustering
• Principal component analysis
• Neural networks
• Convolutional neural networks

Learners completing this course will be able to:

• Distinguish between supervised learning and unsupervised learning
• Apply common probability distributions in machine learning applications
• Use cross validation to select parameters
• Use maximum likelihood estimate (MLE) for parameter estimation
• Implement fundamental learning algorithms such as logistic regression and k-means clustering
• Implement more advanced learning algorithms such as support vector machines and convolutional neural networks
• Design a deep network using an exemplar application to solve a specific problem
• Apply key techniques employed in building deep learning architectures

CSE 578 Data Visualization

Visual representations generated by statistical models help us to make sense of large, complex datasets through interactive exploration, thereby enabling big data to realize its potential for informing decisions. This course covers techniques and algorithms for creating effective visualizations based on principles from graphic design, visual art, perceptual psychology, and cognitive science to enhance the understanding of complex data.

Specific topics covered include:

• data transformations
• exploratory querying
• statistical graphics
• time series analysis
• exploratory spatial data analysis

Learners completing this course will be able to:

• Develop exploratory data analysis and visualization tools using Python and Jupyter notebooks
• Apply design principles for a variety of statistical graphics and visualizations including scatterplots, line charts, histograms, and choropleth maps
• Combine exploratory queries, graphics, and interaction to develop functional tools for exploratory data analysis and visualization

面向人群

任何具有计算机科学本科教育或对计算机组织和体系结构，离散数学，数据结构和算法有扎实基础知识的人。

课程项目

Activity Recognition Using Data Mining

Develop a computing system that can understand human activities where there will be data provided for a given activity, specifically eating action mixed with other unknown activities. The aim is to identify the eating activity amidst the noise.

WHAT YOU WILL LEARN

Apply common data mining algorithms to discover relationships and patterns in large datasets.

Introduction to Statistical Graphics Using Data Visualization

Predict the income of an individual based on different values of input parameters so that a company can tailor its marketing efforts to reach them.

WHAT YOU WILL LEARN

Combine exploratory queries, graphics, and interaction to develop functional tools for exploratory data analysis and data visualization.

Hot Cell Analysis in Big Data

Analyze large spatio-temporal datasets in order to identify statistically significant hot spots using Apache Spark.

WHAT YOU WILL LEARN

Demonstrate handling of computation intensive queries in big data.

Hot Cell Analysis in Statistical Machine Learning

Learn the three major categories of machine learning techniques and apply them to the analysis of a dataset using statistical models.

WHAT YOU WILL LEARN

Understanding of machine learning framework based on the fields of statistics and functional analysis.

预备知识

建议但不是必需具备高级编程语言（例如Java）和脚本语言（例如Python），关系数据库结构和统计信息的知识。

大数据 MasterTrack™ 证书

Big Data MasterTrack™ Certificate

你将学到什么

课程概况

包含课程

面向人群

课程项目

预备知识

同类课程

自动化和敏捷软件工程原理 MasterTrack™ 证书

网络安全 MasterTrack™ 证书

空间数据分析与可视化 MasterTrack™ 计划

区块链应用 MasterTrack™ 证书

这些课程也不错哦

吉他完全攻略 – 初级到高级

全面掌握加密币，做财富的创造者

Amazon亚马逊运营实操和数据分析

核心韩语1：通过实践打好韩语基础

Unity3d游戏设计使用C#

Git/GitHub/GitLab完全教程2018

定价策略优化

数据结构与算法

声明：MOOC中国十分重视知识产权问题，我们发布之课程均源自下列机构，版权均归其所有，本站仅作报道收录并尊重其著作权益。感谢他们对MOOC事业做出的贡献！

© 2008-2022 CMOOC.COM 慕课改变你，你改变世界