Distributed Machine Learning with Apache Spark

Provided by:
8/10 stars
based on  8 reviews
Provided by:
Cost FREE
Start Date TBA

Course Details

Cost

FREE

Upcoming Schedule

  • TBA

Course Provider

edX online courses
Harvard University, the Massachusetts Institute of Technology, and the University of California, Berkeley, are just some of the schools that you have at your fingertips with edX. Through massive open online courses (MOOCs) from the world's best universities, you can develop your knowledge in literature, math, history, food and nutrition, and more. These online classes are taught by highly-regarded experts in the field. If you take a class on computer science through Harvard, you may be tau...
Harvard University, the Massachusetts Institute of Technology, and the University of California, Berkeley, are just some of the schools that you have at your fingertips with edX. Through massive open online courses (MOOCs) from the world's best universities, you can develop your knowledge in literature, math, history, food and nutrition, and more. These online classes are taught by highly-regarded experts in the field. If you take a class on computer science through Harvard, you may be taught by David J. Malan, a senior lecturer on computer science at Harvard University for the School of Engineering and Applied Sciences. But there's not just one professor - you have access to the entire teaching staff, allowing you to receive feedback on assignments straight from the experts. Pursue a Verified Certificate to document your achievements and use your coursework for job and school applications, promotions, and more. EdX also works with top universities to conduct research, allowing them to learn more about learning. Using their findings, edX is able to provide students with the best and most effective courses, constantly enhancing the student experience.

Provider Subject Specialization
Sciences & Technology
Business & Management
22215 reviews

Course Description

Machine learning aims to extract knowledge from data, relying on fundamental concepts in computer science, statistics, probability and optimization. Learning algorithms enable a wide range of applications, from everyday tasks such as product recommendations and spam filtering to bleeding edge applications like self-driving cars and personalized medicine. In the age of ‘big data’, with datasets rapidly growing in size and complexity and cloud computing becoming more pervasive, machine learning techniques are fast becoming a core component of large-scale data processing pipelines.

This statistics and data analysis course introduces the underlying statistical and algorithmic principles required to develop scalable real-world machine learning pipelines. We present an integrated view of data processing by highlighting the various components of these pipelines, including exploratory data analysis, feature extraction, supervised learning...

Machine learning aims to extract knowledge from data, relying on fundamental concepts in computer science, statistics, probability and optimization. Learning algorithms enable a wide range of applications, from everyday tasks such as product recommendations and spam filtering to bleeding edge applications like self-driving cars and personalized medicine. In the age of ‘big data’, with datasets rapidly growing in size and complexity and cloud computing becoming more pervasive, machine learning techniques are fast becoming a core component of large-scale data processing pipelines.

This statistics and data analysis course introduces the underlying statistical and algorithmic principles required to develop scalable real-world machine learning pipelines. We present an integrated view of data processing by highlighting the various components of these pipelines, including exploratory data analysis, feature extraction, supervised learning, and model evaluation. You will gain hands-on experience applying these principles using Spark, a cluster computing system well-suited for large-scale machine learning tasks, and its packages spark.ml and spark.mllib. You will implement distributed algorithms for fundamental statistical models (linear regression, logistic regression, principal component analysis) while tackling key problems from domains such as online advertising and cognitive neuroscience.

Distributed Machine Learning with Apache Spark course image
Reviews 8/10 stars
8 Reviews for Distributed Machine Learning with Apache Spark

Ratings details

  • 5 stars
  • 4 stars
  • 3 stars
  • 2 stars
  • 1 stars
  • 5 stars
  • 4 stars
  • 3 stars
  • 2 stars
  • 1 stars
  • 5 stars
  • 4 stars
  • 3 stars
  • 2 stars
  • 1 stars

Rankings are based on a provider's overall CourseTalk score, which takes into account both average rating and number of ratings. Stars round to the nearest half.

Sort By
Nitish Singh profile image
Nitish Singh profile image

Nitish Singh

3/10 starsDropped
1 year, 9 months ago
Very outdated course. These are the materials one can find all over the internet. Does not delve into nitty gritty and just uses some of the examples already on spark homepage.
Was this review helpful? Yes0
 Flag
student profile image
student profile image

student

4/10 starsCompleted
2 years, 9 months ago
If you expect to learn Spark from lecture videos in this course, you will be probably disappointed. This also holds for other courses in this specialization. If I have to make a speculations, these courses seem to be have created by the marketing department of Databricks to increase the adoption of Spark by putting a whole bunch of different and often irrelevant lectures. One can say the best part of these courses are the lab but even the labs are sometimes out-dated and they don't reflect the most recent status of Spark. For instance, while Spark official pages strongly suggest that data-frame based API's should be used for machine learning applications from now on, many parts of the lab are based on RDD API's. Hide
Was this review helpful? Yes0
 Flag
student profile image
student profile image

student

10/10 starsTaking Now
2 years, 10 months ago
The course is a very good practical course on the ever-changing world of Bid Data ML. Great content that is quite good and easy to follow.
Was this review helpful? Yes0
 Flag
Dan Golding profile image
Dan Golding profile image

Dan Golding

8/10 starsCompleted
2 years, 10 months ago
This was a really great course. It was particularly interesting to see how familiar (albeit basic) machine learning techniques can be adapted to be able to handle big data problems. The labs are really hands on which is nice. One criticism is that because the labs are done entirely on the Databricks platform, the course does not teach you how you would go about implementing your own solutions (i.e. how you would set up your own Spark clusters).
Was this review helpful? Yes0
 Flag
Aakash Moghariya profile image
Aakash Moghariya profile image

Aakash Moghariya

10/10 starsCompleted
2 years, 10 months ago
I am in love with the labs that this course offers you. It is amazing to practice lambda functions and learn the way that dataframes work in Spark apart from learning to implement the distributed machine learning algorithm. I am really looking forward for the advanced version of this course. This is one of the must take course for the people who are looking to explore fundamentals of the Spark.
Was this review helpful? Yes0
 Flag
Aleksey Izmailov profile image
Aleksey Izmailov profile image

Aleksey Izmailov

10/10 starsCompleted
2 years, 10 months ago
Fantastic labs that show how distributed ML and DS can be done in practice. Really makes you go from data to solution and understand every step it takes. Great example of how ML pipelines can be built and the way of thinking one can use for that. Spark notebooks make it easy to visualize and incrementally build the solution.
Was this review helpful? Yes0
 Flag
Karthik Ramakrishnan Ramakrishnan profile image
Karthik Ramakrishnan Ramakrishnan profile image

Karthik Ramakrishnan Ramakrishnan

9/10 starsCompleted
2 years, 10 months ago
The course bought the best out of me. I had previous knowledge in Machine Learning, but this took it to a whole new level with scalable and distributed algorithms. I gained a lot out of it and this would surely be a milestone in my path to become a data scientist.
Was this review helpful? Yes0
 Flag
Ambarish Banerjee profile image
Ambarish Banerjee profile image

Ambarish Banerjee

10/10 starsTaking Now
2 years, 10 months ago
It is a difficult course but every bit is worth it. I feel that I learnt something very useful. I think whoever aspires to get into data science/analysis should consider taking this class.
Was this review helpful? Yes0
 Flag

Rating Details


  • 5 stars
  • 4 stars
  • 3 stars
  • 2 stars
  • 1 stars
  • 5 stars
  • 4 stars
  • 3 stars
  • 2 stars
  • 1 stars
  • 5 stars
  • 4 stars
  • 3 stars
  • 2 stars
  • 1 stars

Rankings are based on a provider's overall CourseTalk score, which takes into account both average rating and number of ratings. Stars round to the nearest half.