你将学到什么
Develop exploratory data analysis and visualization tools using Python and Jupyter notebooks
Apply design principles for a variety of statistical graphics and visualizations including scatterplots, line charts, histograms, and choropleth maps
Apply common data mining algorithms to discover relationships and patterns in large datasets
Implement more advanced learning algorithms such as deep learning and reinforcement learning
Perform scalable data processing operations in cloud computing environments
课程概况
Learn to navigate large, complex datasets through interactive exploration.
With zettabytes of data being collected annually, governments, companies, and people have more access to data than ever before. With so much data, it can be hard to know where to start looking for important insights or trends to drive business decisions.
Data mining techniques provide the first level of abstraction to raw data by extracting patterns, making big data analytics tools increasingly critical for providing meaningful information to inform better business decisions, and applying statistical learning theory to find a predictive function based on data.
You’ll learn to apply mathematical theory and decision making techniques that are vital to big data analysis, classification, clustering, and association rule mining through real-world projects designed by faculty from Arizona State University.
By committing to online study for 4-6 months, you can earn the Big Data MasterTrack Certificate that will be a pathway to the online Master of Computer Science degree at Arizona State University.
包含课程
CSE 511 Data Processing at Scale
Database systems are used to provide convenient access to disk-resident data through efficient query processing, indexing structures, concurrency control, and recovery. This course delves into new frameworks for processing and generating large-scale datasets with parallel and distributed algorithms, covering the design, deployment and use of state-of-the-art data processing systems, which provide scalable access to data.
Specific topics covered include:
• Efficient query processing
• Indexing structures
• Distributed database design
• Parallel query execution
• Concurrency control in distributed parallel database systems
• Data management in cloud computing environments
• Data management in Map/Reduce-based
• NoSQL database systems
Learners completing this course will be able to:
• Perform queries (e.g., SQL) and analytics tasks in state-of-the-art database systems
• Apply leading-edge techniques to design/tune distributed and parallel database systems
• Utilize existing NoSQL database systems as appropriate for specified cases
• Perform database operations (e.g., selection, projection, join, and groupby) in state-of-the-art cluster computing systems such as Hadoop/Spark
• Perform scalable data processing operations (e.g., selection, projection, join, and groupby) in cloud computing environments, including Amazon AWS
CSE 572 Data Mining
Once called “knowledge discovery in databases,” advances in processing power and speed over the last decade have allowed users to move beyond manual, tedious, and time-consuming practices to quick, easy data analysis that harnesses the power of machine learning and high performance computing. This course will introduce you to the fundamentals of data mining and pattern recognition. You will gain a deeper understanding of data through hands-on experience in the topic areas of big data analysis, classification, clustering, and association rule mining. Advanced topics such as reinforcement learning, deep learning, transfer learning and Deep Mind for Google will also be covered. By the end of the course, you will be able to apply state of the art data mining technology to real world applications, analyze and compare competing techniques, and design optimal solutions for a given set of application driven constraints.
Specific topics covered include:
• Data Mining Fundamentals
• Machine Learning
• Data Collection
• Deep Learning
• Data Visualization
• Reinforcement Learning
• Data Mining Algorithms
Learners completing this course will be able to:
• Differentiate among major data mining techniques such as classification, cluster analysis, and association rule mining
• Apply common data mining algorithms to discover relationships and patterns in large datasets
• Implement more advanced learning algorithms such as deep learning and reinforcement learning
• Utilize open source tools to build a data mining project to solve a specific problem
CSE 575 Statistical Machine Learning
The link between inference and computation is central to statistical machine learning, which combines the computational sciences with statistics. In addition to artificial intelligence, fields such as information management, finance, bioinformatics, and communications are significantly influenced by developments in statistical machine learning. This course investigates the data mining and statistical pattern recognition that support artificial intelligence. Main topics covered include supervised learning; unsupervised learning; and deep learning, including major components of machine learning and the data analytics that enable it.
Specific topics covered include:
• Probability distributions
• Maximum likelihood estimation
• Naive Bayes
• Logistic regression
• Support vector machines
• Clustering
• Principal component analysis
• Neural networks
• Convolutional neural networks
Learners completing this course will be able to:
• Distinguish between supervised learning and unsupervised learning
• Apply common probability distributions in machine learning applications
• Use cross validation to select parameters
• Use maximum likelihood estimate (MLE) for parameter estimation
• Implement fundamental learning algorithms such as logistic regression and k-means clustering
• Implement more advanced learning algorithms such as support vector machines and convolutional neural networks
• Design a deep network using an exemplar application to solve a specific problem
• Apply key techniques employed in building deep learning architectures
CSE 578 Data Visualization
Visual representations generated by statistical models help us to make sense of large, complex datasets through interactive exploration, thereby enabling big data to realize its potential for informing decisions. This course covers techniques and algorithms for creating effective visualizations based on principles from graphic design, visual art, perceptual psychology, and cognitive science to enhance the understanding of complex data.
Specific topics covered include:
• data transformations
• exploratory querying
• statistical graphics
• time series analysis
• exploratory spatial data analysis
Learners completing this course will be able to:
• Develop exploratory data analysis and visualization tools using Python and Jupyter notebooks
• Apply design principles for a variety of statistical graphics and visualizations including scatterplots, line charts, histograms, and choropleth maps
• Combine exploratory queries, graphics, and interaction to develop functional tools for exploratory data analysis and visualization
面向人群
任何具有计算机科学本科教育或对计算机组织和体系结构,离散数学,数据结构和算法有扎实基础知识的人。
课程项目
Activity Recognition Using Data Mining
Develop a computing system that can understand human activities where there will be data provided for a given activity, specifically eating action mixed with other unknown activities. The aim is to identify the eating activity amidst the noise.
WHAT YOU WILL LEARN
Apply common data mining algorithms to discover relationships and patterns in large datasets.
Introduction to Statistical Graphics Using Data Visualization
Predict the income of an individual based on different values of input parameters so that a company can tailor its marketing efforts to reach them.
WHAT YOU WILL LEARN
Combine exploratory queries, graphics, and interaction to develop functional tools for exploratory data analysis and data visualization.
Hot Cell Analysis in Big Data
Analyze large spatio-temporal datasets in order to identify statistically significant hot spots using Apache Spark.
WHAT YOU WILL LEARN
Demonstrate handling of computation intensive queries in big data.
Hot Cell Analysis in Statistical Machine Learning
Learn the three major categories of machine learning techniques and apply them to the analysis of a dataset using statistical models.
WHAT YOU WILL LEARN
Understanding of machine learning framework based on the fields of statistics and functional analysis.
预备知识
建议但不是必需具备高级编程语言(例如Java)和脚本语言(例如Python),关系数据库结构和统计信息的知识。