Warning: WP Redis: Connection refused in /www/wwwroot/cmooc.com/wp-content/plugins/powered-cache/includes/dropins/redis-object-cache.php on line 1433
基于Apache Spark的分布式机器学习 | MOOC中国 - 慕课改变你,你改变世界

基于Apache Spark的分布式机器学习

Distributed Machine Learning with Apache Spark

Learn the underlying principles required to develop scalable machine learning pipelines and gain hands-on experience using Apache Spark.

888 次查看
加州大学伯克利分校
edX
  • 完成时间大约为 4
  • 中级
  • 英语
注:因开课平台的各种因素变化,以上开课日期仅供参考

你将学到什么

The underlying statistical and algorithmic principles required to develop scalable real-world machine learning pipelines

Exploratory data analysis, feature extraction, supervised learning, and model evaluation

Application of these principles using Spark

How to implement distributed algorithms for fundamental statistical models

课程概况

Machine learning aims to extract knowledge from data, relying on fundamental concepts in computer science, statistics, probability and optimization. Learning algorithms enable a wide range of applications, from everyday tasks such as product recommendations and spam filtering to bleeding edge applications like self-driving cars and personalized medicine. In the age of ‘big data’, with datasets rapidly growing in size and complexity and cloud computing becoming more pervasive, machine learning techniques are fast becoming a core component of large-scale data processing pipelines.

This statistics and data analysis course introduces the underlying statistical and algorithmic principles required to develop scalable real-world machine learning pipelines. We present an integrated view of data processing by highlighting the various components of these pipelines, including exploratory data analysis, feature extraction, supervised learning, and model evaluation. You will gain hands-on experience applying these principles using Spark, a cluster computing system well-suited for large-scale machine learning tasks, and its packages spark.ml and spark.mllib. You will implement distributed algorithms for fundamental statistical models (linear regression, logistic regression, principal component analysis) while tackling key problems from domains such as online advertising and cognitive neuroscience.

预备知识

Python programming background
experience with PySpark equivalent to CS105x: Introduction to Spark
comfort with mathematical and algorithmic reasoning
familiarity with basic machine learning concepts
exposure to algorithms, probability, linear algebra and calculus

千万首歌曲。全无广告干扰。
此外,您还能在所有设备上欣赏您的整个音乐资料库。免费畅听 3 个月,之后每月只需 ¥10.00。
Apple 广告
声明:MOOC中国十分重视知识产权问题,我们发布之课程均源自下列机构,版权均归其所有,本站仅作报道收录并尊重其著作权益。感谢他们对MOOC事业做出的贡献!
  • Coursera
  • edX
  • OpenLearning
  • FutureLearn
  • iversity
  • Udacity
  • NovoEd
  • Canvas
  • Open2Study
  • Google
  • ewant
  • FUN
  • IOC-Athlete-MOOC
  • World-Science-U
  • Codecademy
  • CourseSites
  • opencourseworld
  • ShareCourse
  • gacco
  • MiriadaX
  • JANUX
  • openhpi
  • Stanford-Open-Edx
  • 网易云课堂
  • 中国大学MOOC
  • 学堂在线
  • 顶你学堂
  • 华文慕课
  • 好大学在线CnMooc
  • (部分课程由Coursera、Udemy、Linkshare共同提供)

© 2008-2022 CMOOC.COM 慕课改变你,你改变世界