Harvard University, the Massachusetts Institute of Technology, and the
University of California, Berkeley, are just some of the schools that you have
at your fingertips with edX. Through massive open online courses (MOOCs) from
the world's best universities, you can develop your knowledge in literature,
math, history, food and nutrition, and more. These online classes are taught
by highly-regarded experts in the field. If you take a class on computer
science through Harvard, you may be tau...

Harvard University, the Massachusetts Institute of Technology, and the
University of California, Berkeley, are just some of the schools that you have
at your fingertips with edX. Through massive open online courses (MOOCs) from
the world's best universities, you can develop your knowledge in literature,
math, history, food and nutrition, and more. These online classes are taught
by highly-regarded experts in the field. If you take a class on computer
science through Harvard, you may be taught by David J. Malan, a senior
lecturer on computer science at Harvard University for the School of
Engineering and Applied Sciences. But there's not just one professor - you
have access to the entire teaching staff, allowing you to receive feedback on
assignments straight from the experts. Pursue a Verified Certificate to
document your achievements and use your coursework for job and school
applications, promotions, and more. EdX also works with top universities to
conduct research, allowing them to learn more about learning. Using their
findings, edX is able to provide students with the best and most effective
courses, constantly enhancing the student experience.

The job of a data scientist is to glean knowledge from complex and noisy datasets.

Reasoning about uncertainty is inherent in the analysis of noisy data. Probability and Statistics provide the mathematical foundation for such reasoning.

In this course, part of the Data Science MicroMasters program, you will learn the foundations of probability and statistics. You will learn both the mathematical theory, and get a hands-on experience of applying this theory to actual data using Jupyter notebooks.

Concepts covered included: random variables, dependence, correlation, regression, PCA, entropy and MDL.

Instructors

Instructors:
Alon Orlitsky

University

University:
The University of California, San Diego

Instructors

Instructors:
Alon Orlitsky

University

University:
The University of California, San Diego

Reviews6/10 stars

17 Reviews for Statistics and Probability in Data Science using Python

Rankings are based on a provider's overall CourseTalk score, which takes into account both average rating and number of ratings. Stars round to the nearest half.

The course started on September 28. and was scheduled for ten weeks. Currently we still wait for the material of week 10 to be released and a the final exam to be provided. Complaints are met by promisses which are not kept. This is the most unreliable course i've ever experienced and breaks the entire Data Science curriculum of UCSandDiegoX. Not worth the money

Short version for busy people:
Video content of the course is outstanding and 100% to the point. I really don't understand the people who believe that they can learn Data Science without dwelling in mathematics heavily used in the field. So, any comments claiming that the content is not relevant to Data Science are plain wrong. Unfortunately, other problems drag the course down, pretty much negating all excellent effort by professor Orlitsky. As a paid course it is a complete disaster.
Longer version for everyone else:
Video lectures of this course delivered by professor Orlitsky are plain outstanding. Beats by far everything I've seen so far in any online course on Probability and Statistics (Yes, even challenging offering from MITx by John Tsitsiklis). Theory was illustrated very lavishly with colorful slides and
quirky jokes by the instructor. Unfortunately, every piece of additional learning material in this course is seri...
Short version for busy people:
Video content of the course is outstanding and 100% to the point. I really don't understand the people who believe that they can learn Data Science without dwelling in mathematics heavily used in the field. So, any comments claiming that the content is not relevant to Data Science are plain wrong. Unfortunately, other problems drag the course down, pretty much negating all excellent effort by professor Orlitsky. As a paid course it is a complete disaster.
Longer version for everyone else:
Video lectures of this course delivered by professor Orlitsky are plain outstanding. Beats by far everything I've seen so far in any online course on Probability and Statistics (Yes, even challenging offering from MITx by John Tsitsiklis). Theory was illustrated very lavishly with colorful slides and
quirky jokes by the instructor. Unfortunately, every piece of additional learning material in this course is seriously lacking and for the price that was asked I find it unacceptable. Lecture comprehension quizzes are very shallow, and as there are none practice exercises in this course of any kind, the only way for you to tell if you're learning material well is to perform on weekly quizzes and homework assignments.
Problems do start right there. Not all questions are adequately explored in the lectures (but could be
tackled in additional materials that are not offered), so quite often you'll be looking for solved problem examples elsewhere. Searching the forum was almost useless and TA support was virtually nonexistent (For example, MITx course on Probability had an awesome support from the community and TAs and provided a lot of solved exercises and comprehension problems), so if you're having a problem with one of the quizzes or programming assignments (more on that later), you're practically on your own. Maybe you'll get a short answer from TA in about 3-5 days, when you've already moved on to another tasks. Course uses Python for programming assignments, which is an awesome idea. However, things go downhill pretty fast there. In the course of first few weeks Notebooks were provided containing additional learning material, but those waters evaporated very fast and later weeks had only assignments. Instructions for the later labs were seriously lacking, sometimes mentioning only expected results without showing you how exactly those results were obtained. Other times the labs were touching the topic that was not covered in the lectures leaving me clueless (as I've mentioned earlier, asking on the
forum was practically useless). Another blow below the belt is weird decision not to include full quizzes and homework assignments for audit learners. So, spent a few weeks with the course and decided to shell out the cash? Great,
now you have to get back to the Week 1 and all the subsequent weeks you've already covered to solve
additional exercises which magically appear after you have paid for the course. Another negative is the course pace. There's a few hours of vidoes almost very week,
and getting through later content required much more time than your average online course (taking into account missing examples and vogue lab explanations). They claim it's self-paced, but in reality you have about 16 weeks to get through all the material, so you're still on the clock. This was one of the reasons I had to stop with the course and lose the money I've spent on it. In the current form I'm afraid I can't recommend it.

I thought this was excellent.
The lectures were long, many being about 30 minutes, but that is surely not such a big deal since we can pause them whenever we want.
The content was fun and rigorous. Prof Orlitsky does a great job of getting across fundamental concepts in an engaging way.
I wanted this course to help me improve my understanding of the underlying principles of probability and statistics, and it did do that.
It also held my attention to the end, so that I made sure I watched every lecture.
The time was well spent.
I audited the course, so paid nothing for it. Given the work that must have gone into the consistently excellent lectures, this is an incredible resource that Orlitsky and UCSD is providing to the world. Good on them!

I wish I could provide mixed scores. The straight math content? Excellent, as were the lectures. The professors were very engaging.
The Python? Not so great... the notebooks in the course all used old Python 2 and there were a lot of minor headaches related to it with the auto-grader.
As part of the first cohort to complete this class, it took almost six months to receive what was meant to be ten weeks of material. Miscues abounded. "It happens."

The course is actually good especially with applications in Python and accompanying notebooks for verified learners. I learned a lot and hope the material will be useful in the real world. It will be great if the next session of the course migrates from Python 2 and Docker to Python 3.

No exams for audit students, too large and slow lectures. Poor Python related part. Content released with delay, this leads to the disruption of personal plans. I will wait for the course to be restarted selfpaced and I hope that exams will be available for audit students.

This is a follow up to my previous review. It turned out that the course staff decided that there is no exam for Audit students is this course. This seems to be a new strategy adopted by some very recent UCSanDiegoX and GTx courses (started late 2017) and is against the enrollment clause stating "Audit this course for free and have complete access to all the course material, activities, tests, and forums". Until now, neither course @staff nor edx support provided any official answer addressing breaking the edx policy by those courses.

The course presentation says the following : "In this course, part of the Data Science MicroMasters [...] You will learn both the mathematical theory, and get a hands-on experience of applying this theory to actual data using Jupyter notebooks."
I would argue that you get neither :
when in comes to "applying this theory to actual data using Jupyter notebooks", nothing could be more false.
You will be asked to build functions to find compositions of series of numbers ("if I have a series of numbers : 1,5,-4,9, in how many different ways can I order them ?").
Nothing to do with Data Science, or with data for that matter.
Now, about the first part of the premise : "you will learn the mathematical theory".
It is true that the course provides the mathematical theory. BUT I would argue that you won't learn it, because :
1) content is offered as a series of "slides" that are read by the instructor (and worse yet, poorly read s...
The course presentation says the following : "In this course, part of the Data Science MicroMasters [...] You will learn both the mathematical theory, and get a hands-on experience of applying this theory to actual data using Jupyter notebooks."
I would argue that you get neither :
when in comes to "applying this theory to actual data using Jupyter notebooks", nothing could be more false.
You will be asked to build functions to find compositions of series of numbers ("if I have a series of numbers : 1,5,-4,9, in how many different ways can I order them ?").
Nothing to do with Data Science, or with data for that matter.
Now, about the first part of the premise : "you will learn the mathematical theory".
It is true that the course provides the mathematical theory. BUT I would argue that you won't learn it, because :
1) content is offered as a series of "slides" that are read by the instructor (and worse yet, poorly read since they are not native speakers). So not very engaging.
2) there are not nearly enough exercices to help you cement what is learned. Each video of 15-30 minutes is accompanied by a "poll", not even an exercise. And then, when you answer the "poll", you are not even given the correct answer ! Which is frustrating and NOT a good way to learn because it has happened several time over the course that :
1/ the poll is split 50/50, so no clear majority can tell you whether you were right or wrong
2/ it has also happened at least once (to the extent of my knowledge) that the majority was wrong !
At the end of each week (a sequence of 8 to eleven videos of 15-30 minutes each) you are given a "comprehension quizz" and a "problem set".
The problem is, if you don't manage to answer correctly, you are NOT EVEN GIVEN the steps that lead to the correct answer.
All the prof took the trouble to do was to write "report to this video". What if you watched the video but didn't understand how to tackle the problem ?
Well, you are on your own, thank you for the $350.
Ow, and remember how the first course of the micromaster made you install Python 3.6 ? Well, the instructors of this class didn't want to take the trouble of updating their content, so you'll have to uninstall your distribution and reinstall and learn Python 2.x
I have completed this course because I paid 350$ for the certificate of the first course of the micromasters certificates and I didn't want them to go to waste. But I would advise anybody who hasn't paid yet to NOT take this class : it is obvious that the instructors put together a course as quickly as they could to get their share of the "Data Science learners" pie, without caring about the learner's experience.
(PS : I rated this course 3 stars just so you would read my review and get and honest account of what is going on, by somebody who completed the course. But the course doesn't deserve three stars).

The course explains very well many concepts in probability and statistics which I learned before but found confusing. The instructor is mostly clear and have good sense of humour. Recommend this course if you want to have a depth understanding of the maths behind data science.

There are many complaints around the course being difficult and teaching things that are not relevant for Data Science. This is so NOT the case.
The course has a roubust set of explanations around introductory probability and set theory. Set theory being the foundation of frequentist probability.
So far Im enjoying the course, I would have liked more "practical" applications through python, but I guess that would´ve made the course just too long.
It is true that the lectures are behind the original schedule but so far they have pushed the deadlines accodingly.
Professor Alon does a good job in explaning the theory and providing examples that go along with it.

This course is a solid complement to MIT's edx series "Introduction to Probability".
The course is much shorter and easier than the MIT series, and more practically oriented since it includes programming assignments.
It's also easier than coursera's Mathematical Biostatistics Boot Camps.
Sadly, there are no advanced statistics online courses available yet (neither MIT nor coursera are advanced).
The course includes a decent quality jupyter notebooks that allow students to interactively play with the concepts presented in the lectures, but the Python programming assignments and quizzes are trivial. For example the Week 6 programming assignments consists of just a single line computing the Bayes rule.
The main problem of this course is the complete lack of @staff interaction on the forums. There are no @staff reactions to the mistakes found in the course materials.
Moreover the materials are released with delays. A minor problem is ...
This course is a solid complement to MIT's edx series "Introduction to Probability".
The course is much shorter and easier than the MIT series, and more practically oriented since it includes programming assignments.
It's also easier than coursera's Mathematical Biostatistics Boot Camps.
Sadly, there are no advanced statistics online courses available yet (neither MIT nor coursera are advanced).
The course includes a decent quality jupyter notebooks that allow students to interactively play with the concepts presented in the lectures, but the Python programming assignments and quizzes are trivial. For example the Week 6 programming assignments consists of just a single line computing the Bayes rule.
The main problem of this course is the complete lack of @staff interaction on the forums. There are no @staff reactions to the mistakes found in the course materials.
Moreover the materials are released with delays. A minor problem is in some videos being unedited and containing repeated sequences. Another minor problem is in the programming assignments in Python 2 instead of Python 3, entirely due to the obsolete usage of the print statement instead of the print function, the latter introduced in Python 2.6 in 2008.
Some students report the course being difficult. It may be difficult for someone coming from low profile high school, but the course is trivial for any natural science college sophomore. The passing grade is set to 50% which allows one to obtain a passing grade after about 2/3 of the course and even without attempting the final exam.
Other complaints are that the course does not prepare for the data science subject since it focuses on probability and statistics. Such complaints are made by drop out students who do not understand the data science field. The probability/statistics material presented in this course won't suffice for any serious data science work.

Could you please take a look at this course and please consider letting us take it again? This time we will have some notice and idea of how to prepare for this unusually difficult course and I will free up my schedule for ten weeks so I can dedicate extra time for this one course. I had no problem in the first course at all and it took 6-8 hours a week. This course was taking 3 hours a day and sometimes even longer. I am good at python and had no problem using statists in python for machine learning, this course was a show stopper for me and I dropped the course because I had no idea it would be this dense and there was no way I could finish by the deadline. I am having no problem in Udacity's statistics courses for reference. I also had no problem with Docker or the first part of the course with the coin examples, that's fine. After that it gets super dense, just way too much crammed in each week. This could easily be two courses. ...
Could you please take a look at this course and please consider letting us take it again? This time we will have some notice and idea of how to prepare for this unusually difficult course and I will free up my schedule for ten weeks so I can dedicate extra time for this one course. I had no problem in the first course at all and it took 6-8 hours a week. This course was taking 3 hours a day and sometimes even longer. I am good at python and had no problem using statists in python for machine learning, this course was a show stopper for me and I dropped the course because I had no idea it would be this dense and there was no way I could finish by the deadline. I am having no problem in Udacity's statistics courses for reference. I also had no problem with Docker or the first part of the course with the coin examples, that's fine. After that it gets super dense, just way too much crammed in each week. This could easily be two courses. Could you please reply so I know if I should wait to re-attempt the MicroMaster program. Thank you.

I agree with the other two reviews of this course is a complete waste of money if your looking to learn data science and python. I feel this course in the content given so far is not a data science course at all but a math course and nothing else. If this course was part of some math certificate as a person with a math degree I would have given a 5. Like the other students I agree this course is very hard and time consuming but would be fine if it said it was a match course. As say its part of a data science course yet we have never used a data set or learned any of the data science tools and libraries that python has do do this type of analysis you learn nothing that can be used in the real world at this this point. Plus as be mention the staff never responds to students issues plus for some reason every week they have technically difficulty in uploading the material so a student can never plain ahead on how to study this mater...
I agree with the other two reviews of this course is a complete waste of money if your looking to learn data science and python. I feel this course in the content given so far is not a data science course at all but a math course and nothing else. If this course was part of some math certificate as a person with a math degree I would have given a 5. Like the other students I agree this course is very hard and time consuming but would be fine if it said it was a match course. As say its part of a data science course yet we have never used a data set or learned any of the data science tools and libraries that python has do do this type of analysis you learn nothing that can be used in the real world at this this point. Plus as be mention the staff never responds to students issues plus for some reason every week they have technically difficulty in uploading the material so a student can never plain ahead on how to study this material as we never know when we will get the videos plus it for some reason takes them another week to two to post the slides for these videos. Not all people can learn by watching videos we also need the written notes as some people learn better by reading the material. I also have no idea what the first course got us to install anaconda and use python 3 then this course the second makes us download and use python 2 how about build on what the first course did and don't make students download something else. As been mention this course is waste of money but fine to audit and re learn math if your interested in just learning math. The biggest reason not to pay for this is the complete lack of support by the staff and there inability of fix there own technical issues in a timely manner or attempt to fix the issues that students have brought up because I've contacted edx many times about this course and they have let me know many students have voices issues about this course and not once has this been really addressed.

I'm sorry to say that, This course is quite unlike the previous one (Python for Data Science). It is very difficult and not useful because of nothing here about data science or machine learning. Everything here about coin, dice and card game it is like K12 math but harder. I expected I will learn the mathematical foundations for machine learning and I will learn statistical tools in python and I will learn to visualize data but the problem no data in this course no datasets just a lot of math equations. Also, the notebooks are terrible and very old (python 2) and not related to the course it is about difficult programming assignments without anything useful just solve this problem about dice and solve this problem about coins or card game and nothing real or related to real world. Also, there are 10 problem sets every week and every problem have 4 questions and all questions fill in the blank. Also, there are 10 questions in Comprehe...
I'm sorry to say that, This course is quite unlike the previous one (Python for Data Science). It is very difficult and not useful because of nothing here about data science or machine learning. Everything here about coin, dice and card game it is like K12 math but harder. I expected I will learn the mathematical foundations for machine learning and I will learn statistical tools in python and I will learn to visualize data but the problem no data in this course no datasets just a lot of math equations. Also, the notebooks are terrible and very old (python 2) and not related to the course it is about difficult programming assignments without anything useful just solve this problem about dice and solve this problem about coins or card game and nothing real or related to real world. Also, there are 10 problem sets every week and every problem have 4 questions and all questions fill in the blank. Also, there are 10 questions in Comprehension Quiz every week. Also, there are more than 10 very long videos every video between (20 - 40 min). Also, the course staff don't answer most questions for students but sometimes answer a few questions after many weeks. At the end, this course requires between (25 - 40 hours) per week and you will get a very bad grade. It is a complete waste of time and money. So if you have more than 300 hours to learn the math behind coin, dice, balls and card game this course will be very good for you. But if you want to learn Statistics and Probability in Data Science using Python just stay away.

To be honest, I think professor Alon Orlitsky one of the best professors in statistics field and he is very clear so he deserves 5 stars and this is my opinion. But the content of this course not related to Data Science until now (week 7) and the content of this course very heavy, I can't watch box office movies which have more than 2 hours so how I can watch approximately 4 hours of math videos every week with many quizzes, problems sets and programming assignments. The professor know there are many students in this course have math allergic and they are here just to learn statistics beyond machine learning and very important statistics tools in Data Science like Numpy, Scipy, Itertools, Pandas, Sklearn and very important algorithms like Naive Bayes, Regression, Classification, Clustering, Support Vector Machines, Decision Trees, Ensemble Learning and Random Forests and how to build all these using math and using only Numpy. I don't...
To be honest, I think professor Alon Orlitsky one of the best professors in statistics field and he is very clear so he deserves 5 stars and this is my opinion. But the content of this course not related to Data Science until now (week 7) and the content of this course very heavy, I can't watch box office movies which have more than 2 hours so how I can watch approximately 4 hours of math videos every week with many quizzes, problems sets and programming assignments. The professor know there are many students in this course have math allergic and they are here just to learn statistics beyond machine learning and very important statistics tools in Data Science like Numpy, Scipy, Itertools, Pandas, Sklearn and very important algorithms like Naive Bayes, Regression, Classification, Clustering, Support Vector Machines, Decision Trees, Ensemble Learning and Random Forests and how to build all these using math and using only Numpy. I don't know why the professor Alon Orlitsky doesn't focus on very important topics in statistics for Data Science and I don't know why he just focuses on math equations which are not related or not important in Data Science or not used in real-world. And I don't know why the course staff makes the problem sets and programming assignments very long and very difficult. I write this review because I and many students wrote many compliments on the discussion forums but nothing changes so far.

I love the course. I have done several MOOCs, but I haven't posted review for any. I have not yet completed the course as this course is yet to be posted online. This course is special the content of the course is exceptional, I highly doubt if any other statistics course could match up this course in the quality of teaching and the content covered in the course so far. The best part about this course is Prof. Alon Orlitsky, he looks like as if he is a very serious guy but he finds some way to make the slides interactive and fun. The videos were as long as 30 minutes in length but I was involved in the content completely because of the fun tidbits in the slides. I have become a fan of Prof. Alon and I hope he comes out with other mathematical courses because we really need some serious good mathematical content on the web. I will keep updating this review as I go along the course.

Rankings are based on a provider's overall CourseTalk score, which takes into account both average rating and number of ratings. Stars round to the nearest half.