Big Data Analysis with Apache Spark

Provided by:
8/10 stars
based on  9 reviews
Provided by:
Cost FREE
Start Date TBA

Course Details

Cost

FREE

Upcoming Schedule

  • TBA

Course Provider

edX online courses
Harvard University, the Massachusetts Institute of Technology, and the University of California, Berkeley, are just some of the schools that you have at your fingertips with edX. Through massive open online courses (MOOCs) from the world's best universities, you can develop your knowledge in literature, math, history, food and nutrition, and more. These online classes are taught by highly-regarded experts in the field. If you take a class on computer science through Harvard, you may be tau...
Harvard University, the Massachusetts Institute of Technology, and the University of California, Berkeley, are just some of the schools that you have at your fingertips with edX. Through massive open online courses (MOOCs) from the world's best universities, you can develop your knowledge in literature, math, history, food and nutrition, and more. These online classes are taught by highly-regarded experts in the field. If you take a class on computer science through Harvard, you may be taught by David J. Malan, a senior lecturer on computer science at Harvard University for the School of Engineering and Applied Sciences. But there's not just one professor - you have access to the entire teaching staff, allowing you to receive feedback on assignments straight from the experts. Pursue a Verified Certificate to document your achievements and use your coursework for job and school applications, promotions, and more. EdX also works with top universities to conduct research, allowing them to learn more about learning. Using their findings, edX is able to provide students with the best and most effective courses, constantly enhancing the student experience.

Provider Subject Specialization
Sciences & Technology
Business & Management
22043 reviews

Course Description

Organizations use their data to support and influence decisions and build data-intensive products and services, such as recommendation, prediction, and diagnostic systems. The collection of skills required by organizations to support these functions has been grouped under the term ‘data science’.

This statistics and data analysis course will attempt to articulate the expected output of data scientists and then teach students how to use PySpark (part of Spark) to deliver against these expectations. The course assignments include log mining, textual entity recognition, and collaborative filtering exercises that teach students how to manipulate data sets using parallel processing with PySpark.

This course covers advanced undergraduate-level material. It requires a programming background and experience with Python (or the ability to learn it quickly). All exercises will use PySpark (the Python API for Spark), and previous experienc...

Organizations use their data to support and influence decisions and build data-intensive products and services, such as recommendation, prediction, and diagnostic systems. The collection of skills required by organizations to support these functions has been grouped under the term ‘data science’.

This statistics and data analysis course will attempt to articulate the expected output of data scientists and then teach students how to use PySpark (part of Spark) to deliver against these expectations. The course assignments include log mining, textual entity recognition, and collaborative filtering exercises that teach students how to manipulate data sets using parallel processing with PySpark.

This course covers advanced undergraduate-level material. It requires a programming background and experience with Python (or the ability to learn it quickly). All exercises will use PySpark (the Python API for Spark), and previous experience with Spark equivalent to Introduction to Apache Spark, is required.

Big Data Analysis with Apache Spark course image
Reviews 8/10 stars
9 Reviews for Big Data Analysis with Apache Spark

Ratings details

  • 5 stars
  • 4 stars
  • 3 stars
  • 2 stars
  • 1 stars
  • 5 stars
  • 4 stars
  • 3 stars
  • 2 stars
  • 1 stars
  • 5 stars
  • 4 stars
  • 3 stars
  • 2 stars
  • 1 stars

Rankings are based on a provider's overall CourseTalk score, which takes into account both average rating and number of ratings. Stars round to the nearest half.

Sort By
student profile image
student profile image

student

10/10 starsTaking Now
1 year, 4 months ago
Hi, I am interested in taking this course.Could i get access to the same set of materials? Please notify whenever the course is back
Was this review helpful? Yes0
 Flag
Priyanka Solanki profile image
Priyanka Solanki profile image

Priyanka Solanki

10/10 starsTaking Now
1 year, 7 months ago
I want to watch the videos for this course? Is it possible to get them now? Here it shows like its not currently available.
Was this review helpful? Yes0
 Flag
Akshay Vaghani profile image
Akshay Vaghani profile image

Akshay Vaghani

10/10 starsTaking Now
1 year, 7 months ago
Hi, I would like to view this course. Is it possible to get aces of this course anyhow ? I am fine to just watch video lectures. Thanks
Was this review helpful? Yes0
 Flag
Xixi Wang profile image
Xixi Wang profile image

Xixi Wang

10/10 starsCompleted
2 years, 7 months ago
good hands-on lab to get you started quickly. But the lecture is not so related to the lab. Better take it with a book on Spark.
Was this review helpful? Yes0
 Flag
student profile image
student profile image

student

10/10 starsCompleted
2 years, 7 months ago
Great course organization, especially the balance between theory and practice. Some tasks were too easy and some were not clear at first, but piazza search usually helped. I consider this is a very good pyspark tutorial with explanation of spark key features.
Was this review helpful? Yes0
 Flag
student profile image
student profile image

student

6/10 starsCompleted
2 years, 7 months ago
A lot of overlapping with the 2 other courses of the xSerie. I would definitely not advise taking this course if you took them. The last of the 4 weeks consists of only 20 minutes of video explaining very basic statistic concepts.
Was this review helpful? Yes0
 Flag
Shankar K profile image
Shankar K profile image

Shankar K

6/10 starsTaking Now
2 years, 7 months ago
This should have been released before CS120x: Distributed Machine Learning with Apache Spark.
Was this review helpful? Yes0
 Flag
Sudip Chahal profile image
Sudip Chahal profile image

Sudip Chahal

10/10 starsCompleted
2 years, 7 months ago
Professor Anthony Joseph, UCB AMPlab, Databricks and the entire team are to be congratulated for developing an absolutely superb SPARK class. In my view, they set the bar for others to aspire to in communicating complex concepts in a MOOC format. In this series, learning is about 95% by doing the labs and professor Joseph and team have developed a whole series of superlative labs that build up step by step to completely non-trivial capabilities. They guide you step by step making sure one cannot veer too far off the track, taking care of lot of the mundane aspects leaving the student to concentrate on the primary concepts at hand. Make no mistake - the labs are not easy - there are lot of new concepts to absorb but they have structured it about as well as it can be leveraging Databricks free community edition and its excellent support for python notebooks (their format is similar to but probably different from ipython). My only ... Professor Anthony Joseph, UCB AMPlab, Databricks and the entire team are to be congratulated for developing an absolutely superb SPARK class. In my view, they set the bar for others to aspire to in communicating complex concepts in a MOOC format. In this series, learning is about 95% by doing the labs and professor Joseph and team have developed a whole series of superlative labs that build up step by step to completely non-trivial capabilities. They guide you step by step making sure one cannot veer too far off the track, taking care of lot of the mundane aspects leaving the student to concentrate on the primary concepts at hand. Make no mistake - the labs are not easy - there are lot of new concepts to absorb but they have structured it about as well as it can be leveraging Databricks free community edition and its excellent support for python notebooks (their format is similar to but probably different from ipython). My only criticism would be that the Piazza discussion forum could be better organized i.e., not just at the lab level but have separate forums for each section of the lab - would make finding the discussions far easier. Once again - a big thank you and congratulations to the professor and team for putting together an outstanding series of classes. It would be great if that team can offer more classes in the series.
Was this review helpful? Yes0
 Flag
student profile image
student profile image

student

4/10 starsTaking Now
2 years, 7 months ago
If you expect to learn Spark from lecture videos in this course, you will be probably disappointed. This also holds for other courses in this specialization. If I have to make a speculations, these courses seem to be have created by the marketing department of Databricks to increase the adoption of Spark by putting a whole bunch of different and often irrelevant lectures. One can say the best part of these courses are the lab but even the labs are sometimes out-dated and they don't reflect the most recent status of Spark. For instance, while Spark official pages strongly suggest that data-frame based API's should be used for machine learning applications from now on, many parts of the lab are based on RDD API's.
Was this review helpful? Yes0
 Flag

Rating Details


  • 5 stars
  • 4 stars
  • 3 stars
  • 2 stars
  • 1 stars
  • 5 stars
  • 4 stars
  • 3 stars
  • 2 stars
  • 1 stars
  • 5 stars
  • 4 stars
  • 3 stars
  • 2 stars
  • 1 stars

Rankings are based on a provider's overall CourseTalk score, which takes into account both average rating and number of ratings. Stars round to the nearest half.