Scalable Machine Learning

Provided by:
9/10 stars
based on  20 reviews
Provided by:
Cost FREE
Start Date TBA

Course Details

Cost

FREE

Upcoming Schedule

  • TBA

Course Provider

edX online courses
Harvard University, the Massachusetts Institute of Technology, and the University of California, Berkeley, are just some of the schools that you have at your fingertips with edX. Through massive open online courses (MOOCs) from the world's best universities, you can develop your knowledge in literature, math, history, food and nutrition, and more. These online classes are taught by highly-regarded experts in the field. If you take a class on computer science through Harvard, you may be tau...
Harvard University, the Massachusetts Institute of Technology, and the University of California, Berkeley, are just some of the schools that you have at your fingertips with edX. Through massive open online courses (MOOCs) from the world's best universities, you can develop your knowledge in literature, math, history, food and nutrition, and more. These online classes are taught by highly-regarded experts in the field. If you take a class on computer science through Harvard, you may be taught by David J. Malan, a senior lecturer on computer science at Harvard University for the School of Engineering and Applied Sciences. But there's not just one professor - you have access to the entire teaching staff, allowing you to receive feedback on assignments straight from the experts. Pursue a Verified Certificate to document your achievements and use your coursework for job and school applications, promotions, and more. EdX also works with top universities to conduct research, allowing them to learn more about learning. Using their findings, edX is able to provide students with the best and most effective courses, constantly enhancing the student experience.

Provider Subject Specialization
Sciences & Technology
Business & Management
24465 reviews

Course Description

Machine learning aims to extract knowledge from data, relying on fundamental concepts in computer science, statistics, probability and optimization. Learning algorithms enable a wide range of applications, from everyday tasks such as product recommendations and spam filtering to bleeding edge applications like self-driving cars and personalized medicine. In the age of ‘Big Data,’ with datasets rapidly growing in size and complexity and cloud computing becoming more pervasive, machine learning techniques are fast becoming a core component of large-scale data processing pipelines.
 
This course introduces the underlying statistical and algorithmic principles required to develop scalable real-world machine learning pipelines. We present an integrated view of data processing by highlighting the various components of these pipelines, including exploratory data analysis, feature extraction, supervised learning, and model evaluation....

Machine learning aims to extract knowledge from data, relying on fundamental concepts in computer science, statistics, probability and optimization. Learning algorithms enable a wide range of applications, from everyday tasks such as product recommendations and spam filtering to bleeding edge applications like self-driving cars and personalized medicine. In the age of ‘Big Data,’ with datasets rapidly growing in size and complexity and cloud computing becoming more pervasive, machine learning techniques are fast becoming a core component of large-scale data processing pipelines.
 
This course introduces the underlying statistical and algorithmic principles required to develop scalable real-world machine learning pipelines. We present an integrated view of data processing by highlighting the various components of these pipelines, including exploratory data analysis, feature extraction, supervised learning, and model evaluation. You will gain hands-on experience applying these principles using Apache Spark, a cluster computing system well-suited for large-scale machine learning tasks. You will implement scalable algorithms for fundamental statistical models (linear regression, logistic regression, matrix factorization, principal component analysis) while tackling key problems from domains such as online advertising and cognitive neuroscience.
 
This self-assessment document provides a short quiz, as well as online resources that review the relevant background material. 

Scalable Machine Learning course image
Reviews 9/10 stars
20 Reviews for Scalable Machine Learning

Ratings details

  • 5 stars
  • 4 stars
  • 3 stars
  • 2 stars
  • 1 stars
  • 5 stars
  • 4 stars
  • 3 stars
  • 2 stars
  • 1 stars
  • 5 stars
  • 4 stars
  • 3 stars
  • 2 stars
  • 1 stars

Rankings are based on a provider's overall CourseTalk score, which takes into account both average rating and number of ratings. Stars round to the nearest half.

Sort By
Claudio Felicioli profile image
Claudio Felicioli profile image
9/10 starsCompleted
  • 58 reviews
  • 58 completed
5 years, 11 months ago
A nice hands-on simple introduction to parallel computing using Apache Spark, with a focus on the machine learning problems. Several different cases of distribution of big matrix computation are explained in details. The coding labs are well structured and you will learn how to code from scratch algorithms like distributed principal components analysis.
Was this review helpful? Yes0
 Flag
Kristina Šekrst profile image
Kristina Šekrst profile image
10/10 starsCompleted
  • 102 reviews
  • 102 completed
6 years, 1 month ago
I'm glad to have learned a bit more about Spark, since for me it was a buzzword until these courses. I'm a bit sad I didn't have enough time to enroll into the previous course, but I've enjoyed this one very much. Python notebooks were a great way to learn stuff dynamically. The labs were difficult, but worth the trouble. Can't wait for more courses from Berkeley!
Was this review helpful? Yes0
 Flag
Borys Zibrov profile image
Borys Zibrov profile image
10/10 starsTaking Now
  • 9 reviews
  • 8 completed
6 years, 2 months ago
Taking this as a follow up to Apache Spark for BigData course. It's still an introductory course and a bit light on the subject, but well, if you have to cover Spark and Machine Learning and then scaling techniques in details that would be quite a course! I took Mining Massive Datasets from Jeff Ullman and it was much much more heave on math and theory, in fact there was only one optional real world programming assignment there as far as I remember (and I tried for a long time to solve it and it was like really really hard for me). So, this course strikes a good balance between theory, scalability, and programming assignments and thus was able to teach me more I guess. IPython notebooks format is quite good, though assignments are a bit harder here then in the BigData course and I had to scroll up and down quite often to look up variable names etc. Anyway, very good introductory course. Would love to see more advanced and difficult s... Taking this as a follow up to Apache Spark for BigData course. It's still an introductory course and a bit light on the subject, but well, if you have to cover Spark and Machine Learning and then scaling techniques in details that would be quite a course! I took Mining Massive Datasets from Jeff Ullman and it was much much more heave on math and theory, in fact there was only one optional real world programming assignment there as far as I remember (and I tried for a long time to solve it and it was like really really hard for me). So, this course strikes a good balance between theory, scalability, and programming assignments and thus was able to teach me more I guess. IPython notebooks format is quite good, though assignments are a bit harder here then in the BigData course and I had to scroll up and down quite often to look up variable names etc. Anyway, very good introductory course. Would love to see more advanced and difficult stuff to come.
Was this review helpful? Yes0
 Flag
Mimanshu Shisodia profile image
Mimanshu Shisodia profile image

Mimanshu Shisodia

9/10 starsCompleted
6 years, 3 months ago
The course was really insightful and knowing Apache Spark was the best part. The basics were use to demonstrate the scalability of the Machine learning concepts and you will develop a very clear understanding at the end. The assignments were literally spoon feeding in terms of explanation and were still very much helpful.
Was this review helpful? Yes0
 Flag
 profile image
 profile image

7/10 starsTaking Now
  • 2 reviews
  • 0 completed
6 years, 3 months ago
will i learn how to build spam filter from this course now i want a reply urgently please reply to me here or send me e_mail ay,ah86@yahoo.com
Was this review helpful? Yes0
 Flag
Christopher Cameron profile image
Christopher Cameron profile image

Christopher Cameron

10/10 starsCompleted
6 years, 3 months ago
An engaging and efficient survey of scalable ML techniques in Spark. Very good labs backed by good lectures. I highly recommend the course. If you meet the prerequisites, the required effort estimates are accurate. If you are trying to learn python while doing the labs, you may spend 2-4x more time than estimated.
Was this review helpful? Yes0
 Flag
Martin Strandbygaard profile image
Martin Strandbygaard profile image
8/10 starsCompleted
  • 2 reviews
  • 2 completed
6 years, 4 months ago
Overall a good course, that is worthwhile spending the time on, if you want to get a basic introduction to solving machine learning problems using Apache Spark. As with the precursor, CS100.1x, the lecture videos and quizzes are pretty light on actual content and nothing spectacular. However, as with the precursor I found the assignments really well structured, interesting, and informative. They use IPython notebook which I found to be a really awesome format for this kind of course and assignments. The course is not heavy on the mathematics of machine learning algorithms, and it's introductions to the used algorithms is very basic. For this, something like Machine Learning on Coursera is a much better course. What this course does is give you a good introduction to solving some actual problems using a selection of machine learning algorithms with Apache Spark. I found some of the assignments for this course to be easier ... Overall a good course, that is worthwhile spending the time on, if you want to get a basic introduction to solving machine learning problems using Apache Spark. As with the precursor, CS100.1x, the lecture videos and quizzes are pretty light on actual content and nothing spectacular. However, as with the precursor I found the assignments really well structured, interesting, and informative. They use IPython notebook which I found to be a really awesome format for this kind of course and assignments. The course is not heavy on the mathematics of machine learning algorithms, and it's introductions to the used algorithms is very basic. For this, something like Machine Learning on Coursera is a much better course. What this course does is give you a good introduction to solving some actual problems using a selection of machine learning algorithms with Apache Spark. I found some of the assignments for this course to be easier than some of the later assignments for the introduction course CS100.1x I had a hard time deciding if this course should get 3 or 4 stars. But ended up with 3 stars. The assignments definitely rate 4 stars, and I think that is the most important aspect of the course. I think the lecture videos only rate 3 stars. For comparison, watch the lectures from Machine Learning on Coursera which I believe rate 5 stars.
Was this review helpful? Yes0
 Flag
Samir Damle profile image
Samir Damle profile image
10/10 starsCompleted
  • 1 review
  • 1 completed
6 years, 4 months ago
I just finished this course a couple of days ago. Tons of thanks to the instructors! I am a designer but now I have gained a lot of confidence in data analytics. This is a great course for someone who wants to get into Data Science and Machine Learning. It is intensive because it exposes a lot of new things related to Algebra and Statistics that can be hard to digest at the start. For instance, gradient descent, one-hot-encoding, PCA, etc. But with practice, you can get the hang of it. The labs (exercises) are so well designed that I was amazed. Each lab takes you from the very basics to challenging levels. The community is very active on the forum (Piazza) and there are thousands of students taking the course and ready to help out. I think it helps a lot to know programming in Python before going for this course. Also, understanding matrix algebra would help.
Was this review helpful? Yes0
 Flag
Kent English profile image
Kent English profile image

Kent English

10/10 starsCompleted
6 years, 4 months ago
This course was really well thought out. The videos are of a nice length (roughly 10 minutes on average) and the content is presented very clearly so they can easily be sped up without losing clarity. A lot of time must have gone into the preparation of the labs; they are detailed, do a good job of building up ideas incrementally and cover interesting material. If you're already familiar with the machine learning algorithms presented there will be some review but the exposition is nicely done, especially the focus on how the algorithms can be parallelized (which, of course, is the point of the course). I only wish it had been longer so that more algorithms could have been covered.
Was this review helpful? Yes0
 Flag
Xiaodong Dang profile image
Xiaodong Dang profile image

Xiaodong Dang

8/10 starsCompleted
6 years, 4 months ago
I have to say thank you for providing such a great opportunity of learning. I am switching my career into data analytics and upskilling through all resources I can find. This is a nice course to study machine learning for big data. The videos don't contain much instructions about the coding labs - unlike many other online courses, this makes the homework more challenging and requires me to learn many things on myself, like the design of modularity, input/output of each function to be defined, and specific python or numpy libraries. It feels wonderful when you solve each problem, get rid of each bug and see tests passed. I am very happy with the course and would like to try out more aspects of machine learning on spark, like neural network and decision trees. Also will be very interested to use R on spark as well.
Was this review helpful? Yes0
 Flag
Brinda Mo profile image
Brinda Mo profile image
10/10 starsCompleted
  • 1 review
  • 1 completed
6 years, 4 months ago
Course was very useful. I found the second and third assignment hard, but once I got used to spark it was much easier. The assignments were very well organized. I learnt a great deal in this course. This course is a right mix of theory and practice
Was this review helpful? Yes0
 Flag
Student profile image
Student profile image

Student

10/10 starsCompleted
6 years, 4 months ago
I work in the Business Intelligence space and been working with distributed systems for a few years now, I am Databricks Certified Spark developer and for the past 6 months been working in the Machine Learning space, implementing learning algorithms in Spark with MLlib + Scala. I have been using Python + Pandas for a few years for data cleansing etc. Along with this course I am also taking Andrew Ng's Machine Learning course. With that said when I first enrolled in this course my expectations were to learn about practical techniques to implementing learning algorithms in a distributed environment. Now that I have completed the course I can say that the course strikes a perfect balance of introducing learning algorithms, explaining the underlying theory in a succinct way and at the same time providing ample amount of practical techniques via the labs. For instance the way the lectures explained the One Hot Encoding technique and the... I work in the Business Intelligence space and been working with distributed systems for a few years now, I am Databricks Certified Spark developer and for the past 6 months been working in the Machine Learning space, implementing learning algorithms in Spark with MLlib + Scala. I have been using Python + Pandas for a few years for data cleansing etc. Along with this course I am also taking Andrew Ng's Machine Learning course. With that said when I first enrolled in this course my expectations were to learn about practical techniques to implementing learning algorithms in a distributed environment. Now that I have completed the course I can say that the course strikes a perfect balance of introducing learning algorithms, explaining the underlying theory in a succinct way and at the same time providing ample amount of practical techniques via the labs. For instance the way the lectures explained the One Hot Encoding technique and the way the labs introduce its implementation staring with a small dataset and gradually implementing the same on a larger distributed dataset was really helpful. My biggest take away was how each week the lectures explained each algorithm’s implementation in a distributed and scalable fashion. The lectures show how each algorithm can scale as a set of map and reduce operations in a data parallel computing environment, like how one of the lectures explains matrix multiplication as a set of outer products in the map step and summing the results in the reduce step was very beneficial. Overall I think its a very good introductory course on distributed machine learning and UC Berkeley and Databricks have done a tremendous job presenting underlying theoretic concepts as well as presenting great practical applications without being too overwhelming for the learner. Also the instructors were engaging and very responsive on the course forums. Hope in the near future there will be an advanced version of the course with more advanced algorithms and some new practical examples.
Was this review helpful? Yes0
 Flag
 profile image
 profile image

10/10 starsCompleted
  • 1 review
  • 1 completed
6 years, 4 months ago
Excellent course! I learned python, spark, machine learning and cool applications from doing the labs. Best of all, it's flexible schedule which allow me to learn during nights and weekends at home. It's free and open for all (I paid $50 for verified cert as my support) Thanks a lot! Wish you'll have more future courses on ML.
Was this review helpful? Yes0
 Flag
Yutong Li profile image
Yutong Li profile image

Yutong Li

10/10 starsCompleted
6 years, 4 months ago
Two review stars totally does not make sense, I am very curious and would like to ask how long was spent to finish each lab. Saying the lab is very easy is nothing but irresponsibility. I got two master degrees in computer science from two top universities including Carnegie Mellon. The time I spent finishing the labs(including notes taking, my personal habit of learning) are basically cumulative 6-10 hours depending on the difficulties and domain background. After taking the course - introduction to spark, I continued to take this following course to get the UCBerkely Big Data xSeries certificate. Yes, this course is just a short-versioned course in machine learning. But considering its length(5 weeks) it would not cover as much as a regular semester course you would do on campus, that's why it only covers linear regression, logistic regression and PCA. What's more important at least for me, is, after completing all the lectures and... Two review stars totally does not make sense, I am very curious and would like to ask how long was spent to finish each lab. Saying the lab is very easy is nothing but irresponsibility. I got two master degrees in computer science from two top universities including Carnegie Mellon. The time I spent finishing the labs(including notes taking, my personal habit of learning) are basically cumulative 6-10 hours depending on the difficulties and domain background. After taking the course - introduction to spark, I continued to take this following course to get the UCBerkely Big Data xSeries certificate. Yes, this course is just a short-versioned course in machine learning. But considering its length(5 weeks) it would not cover as much as a regular semester course you would do on campus, that's why it only covers linear regression, logistic regression and PCA. What's more important at least for me, is, after completing all the lectures and labs, I would say it has given me pretty a lot. Recalling the details that I learned from whatever lectures or labs always makes me excited and grateful to the course developers. The course greatly improves my knowledge and skills on what it covers, although the covered topics are not that broad. However, please also let me honestly point out couple of things that the course may need to improve: I found some part of the lectures are hard to understand. I also found there were small errors in labs(two places, I could not remember where, but they can be identified if you really know how to do the lab). Other than that, everything is great. Overall speaking, I learned a lot and I am very grateful to the course developers who gained me such a rewarding experience.
Was this review helpful? Yes0
 Flag
Greg Hamel profile image
Greg Hamel profile image
8/10 starsCompleted
  • 115 reviews
  • 106 completed
6 years, 4 months ago
Scalable Machine Learning is a 5-week distributed machine learning course offered by UC Berkeley through the edX platform. It is a follow up to another UC Berkely course: Introduction to Big Data with Apache Spark. Although the first course is not a strict perquisite, Salable Machine Learning uses the same virtual machine and even has some overlap with the homework labs, so it is beneficial to take Introduction to Big Data first. Scalable Machine Learning teaches distributed machine learning basics using Pyspark, Apache Spark’s Python API. Basic proficiency with Python is necessary to pass the course and some exposure to algorithms and machine learning concepts is helpful. Course evaluation is based primarily on 5 labs distributed as iPython notebooks. The first two weeks of the course cover machine learning basics and introduce Apache Spark. For students already familiar with machine learning basics who took Introduction to ... Scalable Machine Learning is a 5-week distributed machine learning course offered by UC Berkeley through the edX platform. It is a follow up to another UC Berkely course: Introduction to Big Data with Apache Spark. Although the first course is not a strict perquisite, Salable Machine Learning uses the same virtual machine and even has some overlap with the homework labs, so it is beneficial to take Introduction to Big Data first. Scalable Machine Learning teaches distributed machine learning basics using Pyspark, Apache Spark’s Python API. Basic proficiency with Python is necessary to pass the course and some exposure to algorithms and machine learning concepts is helpful. Course evaluation is based primarily on 5 labs distributed as iPython notebooks. The first two weeks of the course cover machine learning basics and introduce Apache Spark. For students already familiar with machine learning basics who took Introduction to Big Data, there’s not much new to learn during first two weeks. Week 2 is essentially an exact clone of week 2 of the intro to big data course, including the lab assignment. The final 3 weeks have meatier lecture content and longer labs, each covering a different machine learning technique--linear regression, logistic regression and principal component analysis. The lecture content is clean and the lecturer speaks clearly. His delivery isn’t perfect, but the only real purpose of the lectures is to serve as background information for the meat of the course: the labs. Each lab is a lengthy iPython notebook with several sections leading you through the process of creating a pipeline for running a machine learning algorithm with Pyspark. Much of the code you need is provided for you, but writing the key functions and data transformations necessary to complete the labs can still be time consuming. Little things like an ambiguous instruction or uncaught error you made earlier in the assignment can result in bugs that take a while to squash. Despite occasional frustrations, the labs do a good job interspersing instruction with practical, hands-on learning. Scalable Machine Learning is a quality introduction to machine learning with Pyspark that focuses on labs over lectures. The lectures could be better and some of the instructions and error checks in the labs could be more comprehensive, but this is a great course for those looking to learn by doing. I give Scalable Machine Learning 4 out of 5 stars: Very Good.
Was this review helpful? Yes0
 Flag
Chaoran Yu profile image
Chaoran Yu profile image

Chaoran Yu

10/10 starsCompleted
6 years, 4 months ago
This course is very practical. I learned a lot about implementing common ML algorithms in a distributed fashion. Thank you course staff for the great efforts!
Was this review helpful? Yes0
 Flag
R Stratton profile image
R Stratton profile image

R Stratton

8/10 starsTaking Now
6 years, 4 months ago
Well thought out, note that content is entirely geared towards prediction with zero interest in inference.
Was this review helpful? Yes0
 Flag
Mateusz B. profile image
Mateusz B. profile image
4/10 starsTaking Now
  • 1 review
  • 0 completed
6 years, 4 months ago
I really expected something different. This is a super basic course on machine learning, level of the content is suprisingly low. I wanted to learn about scaling machine learning, distributed machine learning (assuming the basic knowledge of ML) and this version of the course barely touches those subjects. If you want an introduction to ML, Andrew Ng's course is much better. Labs are very, very easy. You just need to fill some obvious lines in the code, 90% of the code is already there. If you did some other course on Machine Learning, don't take this one, it's a waste of time.
Was this review helpful? Yes0
 Flag
Student profile image
Student profile image

Student

10/10 starsTaking Now
6 years, 4 months ago
Theory explained in detail and hands-on labs are nicely put together. Currently working on my 4th lab, and so far I have been thoroughly enjoying this course. Thanks Berkeley for putting it together.
Was this review helpful? Yes0
 Flag
BHAWANI KAFLE profile image
BHAWANI KAFLE profile image
8/10 starsTaking Now
  • 1 review
  • 0 completed
6 years, 5 months ago
This course is unique and more practical.
Was this review helpful? Yes0
 Flag

Rating Details


  • 5 stars
  • 4 stars
  • 3 stars
  • 2 stars
  • 1 stars
  • 5 stars
  • 4 stars
  • 3 stars
  • 2 stars
  • 1 stars
  • 5 stars
  • 4 stars
  • 3 stars
  • 2 stars
  • 1 stars

Rankings are based on a provider's overall CourseTalk score, which takes into account both average rating and number of ratings. Stars round to the nearest half.