- 115 reviews
- 106 completed
Business Metrics for Data-Driven Companies is the first course in the “Excel to MySQL: Analytic Techniques for Business Specialization” offered by Duke University through Coursera. This short 4-week, self-paced course introduces the concept of business metrics and the role they play in business analytics. It also spends some time discussing the various data-centric roles at different types of companies. The course has no prerequisites and grading is based on 3 multiple-choice quizzes and a final case-study assignment. The lecture content in Business Metrics is crisp and the lecturer is easy to understand. There are only 3 short weeks of lecture content as the final week is devoted to the case study. The peer-graded case study assignment involves identifying and explaining a business metric in a fictitious business. The course explains several common business metrics in detail but doesn’t send as much time on how to use metrics to formulate questions, inform analysis or make decisions. Hopefully these are topics that will be covered in more detail in some of the upcoming courses in the specialization Business Metrics for Data-Driven Companies is a good overview of business metrics and business data culture. As the first part of a larger specialization, it concludes before you get to use the metrics you learn about in any sort of analysis. The value of this course will ultimately depend upon whether the follow up courses make good use of the foundation it lays. As a standalone course, I give Business Metrics for Data-Driven Companies 3.5 out of 5 stars: Good.
Text Mining and Analytics is the fourth course in the Data Mining specialization offered by the University of Illinois at Urbana-Champagne through Coursera. Text Mining builds upon the second course in the specialization, Text Retrieval and Search Engines. Course topics include mining word relations, topic discovery, text clustering, text categorization and sentiment analysis. The course lists programming proficiency (especially in C++) and knowledge of probability and statistics. Keeping with the system established by other data mining specialization track courses, grading is based entirely upon 4 multiple choice quizzes with 10 questions apiece. You only get one attempt at the quizzes. Text Mining and Analytics is information-packed. Each week has 2.5 to 4 hours of lecture content in video segments that generally range from 10 to 20 minutes. The videos quality is satisfactory but the explanations and content on the slides could be a bit clearer. Despite the long videos, there are no comprehension questions or exercises to interact with during or after lecture segments to reinforce learning. By the time you reach the quiz at the end of the unit, you may find yourself having to go back review certain videos to answer the questions. There is an optional programming assignment. Text Mining and Analytics covers many useful data mining topics, but it has too much lackluster video content for its own good. I can’t help but feel like a better course would have been able to condense the videos down to cover the same topics more clearly in half the time, leaving room for more quizzes and exercises. This course could serve as useful as reference material but students watching straight through may find a lot of information going in one ear and out the other. I give Text Mining and Analytics 2.5 out of 5 stars: Mediocre.
Cluster Analysis in Data Mining is third course in Coursera's new data mining specialization offered by the University of Illinois Urbana-Champaign. The course is a 4-week overview of data clustering: unsupervised learning methods that attempt to group data into clusters of related or similar observations. The course covers two most common clustering methods--K means and hierarchical clustering--as well as more than a dozen other clustering algorithms. Grading is based on 4 weekly quizzes with 3 attempts each. Cluster Analysis is taught by Professor Jiawei Han who was the instructor for the first course in the data mining specialization: Pattern Discovery in Data Mining. The quality of the slides, instruction and organization of materials in this course is slightly better than the pattern discovery course, but that isn't saying much: it is still below Coursera's usual high standards. The course rushes from one topic to another with instruction that is mediocre at best downright confusing at its worst. That's not to say you can't learn anything from this course, but the instruction is often more of a hindrance than a help. There are occasional in-lecture quizzes, but the graded quizzes largely fail to foster any understanding of the material. An optional programming assignment was added half way through the course; in a course about data mining, programming assignments should be front and center, not added as an afterthought to quell an outcry from students. Cluster Analysis in Data Mining is another disappointing entry in Coursera's data mining specialization. Although the course covers many different clustering methods, poor instruction makes it hard to gain a good understanding of the material unless you are extremely attentive or watch the videos several times. I give Cluster Analysis in Data Mining 2 out of 5 stars: Poor.
This class claims no prerequisites other than an intro-level knowledge of CS, but you'll have a much easier time if you come into it knowing certain things like the Linux command line and emacs. The entire course is structured around using a specific set of tools (Amazon Web Services, linux, emacs, node.js heroku, GitHub) and following along with very long sets of instructions to get a basic crowdsourcing site online using those tools. In the early goings, the course can be pretty frustrating if you don't already have some familiarity with linux, screen, Git and emacs. Otherwise, you'll have to spend a significant amount of effort learning simple keyboard commands to get anywhere. Once you learn those basic nut and bolts, the rest of the course is mostly a matter of following a series of simple steps that are mostly spoon fed to you. Startup Engineering is probably a bit overly ambitious and tries to introduce too many different topics all at once, which can leave students scratching their heads and following along with instructions to complete homeworks without necessarily knowing what they are actually doing. I think the main issue people are having is that the course plays out more like an extended tutorial or series of resources than an actual course. The video lectures are extremely brief and not particularly insightful: all they do is go through the lecture PDFs which are 30-50 page, detailed info dumps on various topics related to the tech stack they are using and business start- ups. They are a useful resources, but they take an enormous amount of time to go through in detail, since they have links to other resources that themselves take hours and hours to look through. It also seems that the main lecturer is likely too busy to make comprehensive video lectures, since they are so short and usually late. In short, there is a ton of useful material here for a motivated self-learner to sift through, but it lacks focus and engaging video lectures. I give this course 2.5 out of 5 stars: below average.
Linear and Integer Programming is a 7-week course covering linear programming in detail. The course focuses on teaching the simplex method for optimizing systems linear equations with constraints for the first 4 weeks and then covers integer programming and applications. You should be comfortable with basic linear algebra and calculus before taking this course. The course includes optional programming assignments that allow students to build up their own simplex algorithms over the course of the class, but you can easily pass the course just taking the weekly quizzes. Linear and Integer Programming does an admirable job tacking a dense, dry subject. The instructors are easy to understand and explain confusing concepts well. The presentation style and video quality seem a bit dated, but it doesn't detract much from the learning experience. I must admit that my interest waned as the course went on because I took it due to curiosity than rather than a preexisting interest in the subject. That was a mistake. You should not take this course for fun; take it if you really want to learn about linear programming and have the time to get through all the lectures, supplementary materials and programming assignments. Overall, Linear and Integer Programming is a great course if you want to learn about the simplex algorithm in depth and understand important considerations and applications of linear and integer programming.
UT.7.01x: Foundations of Data Analysis is a gentle, 13 week introduction to statistics and the R programming language. The course covers basic descriptive statistics, the normal distribution, sampling and hypothesis testing, including t-tests, chi-square tests and ANOVA. The course has no prerequisites, although you may need to spend some extra time learning the basics of R if you haven't used it before. Each week of Foundations of Data Analysis begins with a reading assignment, a couple of lecture videos with comprehension questions and an R programming tutorial. The videos tend to be in the 7-10 minute range and the tutorials typically total less than 10 minutes a week, so the total video content per week is usually 20-30 minutes. The videos are generally well-edited and the professor does a good job describing concepts simply and concisely. Each week has a prelab, lab and problem set that allow you to apply the concepts you learn in lecture and in the R tutorials. Each problem set consists of 3-4 mini case studies, so you'll probably end up spend most of your time on the labs and problem sets. The assignments are not very difficult, although many questions limit you to 1 or 2 attempts. You need a cumulative score of 70% to earn a certificate. Foundations of Data Analysis introduces new concepts at a relatively slow pace and gives students a good amount of practice through the labs and assignments. Concepts are explained well in lecture so the readings are not always necessary to do the activities, but they often provide extra depth and raise considerations that are not discussed lecture. The course did have some hiccups with homework questions and auto-graders and many answers expect rounded answers, which can result in frustrating off-by-a-fraction errors. In addition, the course uses an external forum system called Piazza instead of the normal edX forums, which I found to a hassle. Bottom line: UT.7.01x is a great place for a beginner to start with stats and R as long as you don't mind an external forum.
Supervised learning is the first of a 3 course machine learning series offered by Georgia Tech through Udacity. This course is part of an online masters in CS track, so it assumes students have significant amount of coursework in math and CS. To get the most out of this course you should have at least a basic understanding of statistics, probability and linear algebra. You should also be have taken an algorithms course that covers big O notation and the P vs NP question. Some prior exposure to machine learning and neural networks will be helpful. This is a difficult course to rate because it does a few things very well and some things, not so well. On the plus side, the professors are very knowledgeable, have great chemistry and make a lot of jokes, some of which are quite humorous. I don’t think I’ve ever chuckled so much taking a MOOC before. The course provides a nice overview of some key topics in supervised learning, such as decision trees, linear regression, neural networks, k-nearest neighbors, ensemble learning and Bayesian networks. Content is a mixture of high level discussion, pseudocode for learning algorithms and math behind the algorithms. The difficulty level can vary quite a bit from one section to the next depending on your background knowledge. On the down side, this course does not have any homework or programming assignments (at least if you take the freeware course) so you don’t get any experience actually implementing the algorithms presented. The course teaches material by having one professor present topics to the other in each section. This method works OK, but since a professor acts as the student, it can move a bit fast since the “student” grasps all the material much quicker than a real student would. All in all this is a good course that covers a lot of ground in a relatively short amount of time—probably too short. It could benefit from slowing down a little bit and providing at least a few short homework sets and/or programming assignments.
Udacity's Intro to Machine Learning is an introduction to data analysis using Python and the sklearn package. The course consists of 15 lessons covering a wide range of machine learning topics including classification algorithms (Naive Bayes, decision trees and SVMs), linear regression, clustering, selecting and transforming features and validation. As a self-paced course, you can take however long you wish on each lesson; some take less than an hour, while others can take several hours depending on how long you work on the mini projects. Intro to Machine Learning requires basic programming and math skills. Each lesson consists of a series of video segments and quizzes introducing a new topic followed by a mini-project that gives you a chance to work with code dealing with the topic at hand. Katie does most of the teaching and her enthusiasm helps keep the course engaging. The quizzes can, at times, seem patronizingly simple. The mini projects are a bit harder and contribute more to learning, although they occasionally lack adequate guidance and feedback to help students arrive at the expected output. The final project and many of the mini-projects leading up to it, involve detecting persons of interest in the Enron scandal using a data set of emails sent by Enron employees. Interesting real-world data sets are always a plus. Intro to Machine Learning is an accessible first course in machine learning that prioritizes breadth, high level understanding and practical tools over depth and theory. You won't be an expert in any of the topics covered in this course by the time you're done, but you'll have a good foundation to build upon. If you are interested taking a similar course with many interesting mini projects that uses the R programming language, try MIT's Analytics Edge on edX. Coursera's Machine Learning with Andrew Ng is a logical next step to dig deeper into machine learning algorithm design and implementation, while Caltech's Learning from Data on edX is a great course if you are interested in machine learning theory.
Explore statistics with R is a 5-week introductory level course covering the basics of R, statistics and using R for statistical analysis. The course covers 3 main topic areas in 4 weeks--R basics, getting and manipulating data in R and statistical tests in R. The 5th week consists of a final graded assignment where you follow along with research project conducted in R. Each week consists of a few short lecture videos followed by a series of graded quiz questions. The class awards a certificate if you achieve a total score at last 60% on the quizzes and the final graded assignment. Explore stats in R offers some quality content, but it is too short constitute a complete intro to R or stats. The professor speaks clearly, explains concepts well and seems genuinely excited to be teaching the course. I noticed he seemed active on the course's discussion boards, which is nice to see. The quizzes were too few and a bit too easy: they generally tested conceptual knowledge and did not require the student to do much in R besides copy, paste and run code. As a course focused on statistical operations, it didn't teach basic programming concepts in R like control flow and functions. This course could benefit from having a bit more content each week and beefing up the homework exercises to force students to do a little bit more in R on their own. Extending the course by a couple of weeks would also give the professor time to cover some neglected topics like programming basics in R and more on data visualization. With expanded content this could be a great course, so hopefully they'll make some tweaks and additions and offer it again.
MIT's Introduction to Game Design on edX is a course about crafting games with a focus on video games. I don't normally write course reviews until all the course content is available, but I feel compelled to write this review early (after the 3rd week of 6) because the course is diverging from my expectations so I may not finish it. I love to play board games and have designed a few board/card games in the past that never really got off the ground. I signed up for this course with the hope that it might help me improve a game I'm working on. The course description and promo video suggest that it would mainly focus on general game mechanics, prototyping and play testing, with little to no mention of video games outside of some digital prototyping. The first 3 weeks give some attention to both board games and video games, but the course is focusing more and more on video games with each successive week. The majority of guest speaker time and homework project time is devoted to digital games. The final 3 weeks cover digital prototyping, user interfaces and the business of games, which are likely to be heavily skewed toward video games over board games. Intro to game design has good content and for someone interested in making digital games, it would probably be a great course. Its main failing is that it didn't make it clear ahead of time that the focus of the course was going to be digital games. As it stands now, the course tries to split time and assignments between board and video games, which is not ideal. There are sections that are useful for board game designers here and there but it is getting difficult to weed through all the video-game oriented content. The course would be better split into 2 courses, one for board/card game design and one for digital game design.
Khan Academy's linear algebra lecture series provides a thorough introduction to linear algebra from the basics of vectors and matrices to projections into lower dimensional spaces, eigenvectors and eigenvalues. Khan has an impressive amount of knowledge and manages to be engaging with his voice-overs despite no actual face time. Khan's lecture videos all follow the same basic formula: he writes equations and works through problems with colored pens on a black background while talking students through everything he is doing. This format works remarkably well and I find it much more engaging than courses that use narrations over static slides. Seeing someone work through problems in real time is much more helpful and interesting than showing slide after slide full of information and equations prepared in advance. Such walls of text and equations can be intimidating and make it hard for students to know where to focus their attention. On the down side, real-time exercises can be time- consuming so it's not uncommon for videos to reach the 15-20 minute range. I went through parts this course to supplement learning in edX's Linear Algebra: Foundations to Frontiers and often found Khan's videos and explanations more engaging and easier to understand than the videos in the edX course.
How to use Git and GitHub is a 3-week introductory course on the basics of the Git version control system. As a short course with only 3 lessons, it focuses on the giving students a solid grounding in the basics of Git and doesn't stray too far into any advanced topics. Lesson 1 covers version control in general, checking differences between files, commits, cloning, git log and getting Git set up on your computer. Lesson 2 covers basics of repositories, branches, merging and merge conflicts. Lesson 3 introduces GitHub and related commands and considerations including remotes, pushing, pulling, forking and issues that may arise due to collaboration. How to use Git and GitHub does exactly what a short intro level course should do: stay focused on covering the basics in detail without taking diversions into esoteric features that are likely to confuse students and distract them from forming foundation knowledge. Sarah and Caroline do a good job explaining things at a level and pace appropriate for an intro course. The course has a bit more reading embedded in the video playlist than most Udacity courses. Also, many of the quizzes require you to run commands on your computer and copy and paste output back into Udacity, which can be a bit troublesome. It would be nice if they had an interactive Git environment similar to Code School's allowing you to do everything you need to do right in the browser. Still, How to Git and GitHub is a great place to start if you are learning about Git for the first time.
Intro to HTML and CSS is a 3-lesson primer on front-end web development and design. Although the name of the course suggests you'll learn HTML and CSS basics, the content is actually focused on higher level web development and design concepts like web page and project structure, responsive design and web frameworks. The course spends very little time talking about nuts and bolts like different HTML tags and CSS properties. The content itself is well done, it just strays a bit from what you'd expect given the title of the course. It should be entitled "Fundamentals of front-end web development/design" or something similar. This course is hard to rate given that the content is good, but it doesn't quite fulfill the expectations set by the title. As such, I'm subtracting 1 star from what is otherwise a nice, short intro to front end web development. If you want to learn HTML nuts and bolts, the first week of Udacity's "web development" spends a lot more time going over HTML tags. The HTML and CSS courses on Code Academy and Code School are also good options.
Age of Globalization is board overview of globalization that covers a wide range of topics including transportation systems, capitalism, regulation, social justice, the labor market, pop culture and sports. The course starts with a brief overview of the subject of globalization, world exploration, trade and imperialism. After the first two weeks, the remaining 10 weeks of content are released all at once, giving students the ability to skip around and sample topics of interest. Each week of content has an overarching theme and several sections that touch on various aspects of the theme or cite specific cases relevant to the theme. Each section typically starts with one or more short readings followed by a 5-20 minute lecture video, a discussion question and a short comprehension quiz. The content of Age of Globalization is well organized and the video and presentation quality are good. The professor is easy to understand and no background knowledge is needed to understand any of the content. The lecturer does speak a bit slowly, and given the non-technical nature of the content, you may want to turn up the video playback speed. The course has a very "liberal arts" feel to it and some of the content may be review for the educated viewer. Sampling the topics that interest you and skipping others may be a better use of your time than trying to complete all the content in order.
Introduction to Data Science is misleading title for this course because it is not introductory level and it does not have a sensible flow that builds from one week to the next as you would expect from an intro course. Instead, the course acts as more of a data science sampler that introduces new topics each week that often have little to do with material covered in previous weeks. Lecture topics include relational databases, relational algebra, SQL, MapReduce, No SQL, miscellaneous topics in statistics, machine learning, visualization and graph analytics. If that sounds like a disjointed smorgasbord of topics, it is. To make matters even more complicated, the programming assignments use three different languages: Python, R and SQL. This course is best suited for those who have some exposure to Python, R, SQL and statistics. If you have the appropriate background knowledge, this course touches on many interesting topics and while the lecturer's delivery is not great, he is quite knowledgeable and the material usually isn't too hard to grasp. Although the homework assignments require different languages and may take you a while to complete, they are rewarding. For instance, you'll work with real Twitter data you capture from the net, implement MapReduce operations in Python and participate in a machine learning competition on Kaggle.com. Introduction to data science is likely to be frustrating to those expecting a general intro to data science. The course jumps around too much and uses too many different tools to be a good first course in data science, but the breadth of topics covered and programming assignments make this course worth a look if you already have some exposure to data science or the tools the course uses. If nothing else, you can skip through the lectures and watch sections that are of particular interest to you.
I took this course on MIT OCW before it launched on the EdX platform, so the material I worked with is probably a bit different in the most up to date version. This course is a comprehensive introduction to computer science and the Python programming language. Topics covered include: basic operations, control flow, loops, functions, recursion, algorithm complexity, divide and conquer, basic sorting algorithms, dynamic programming, the knapsack problem, object oriented programming, simulation, random walks and Monte Carlo simulation. The OCW course was fairly high level and spent more time talking about computer science concepts than Python syntax, which made the programming more difficult than it could have been had it been tailored for online learners. The newest EdX version has better resources for learning the nuts and bolts necessary to complete the course. This course is a lot of work and more comprehensive and difficult than many other intro CS courses.
Introduction to Linux is a self-paced course offered by the Linux foundation geared toward students interested in using Linux for the first time. The course covers a wide range of topics including installing Linux, differences between Linux distributions, the GUI, the command line, various system operations and Bash scripting. You can use any of three different Linux distributions--Ubuntu, CentOS and OpenSUSE--to follow along with the materiel. The content consists mainly of static slides with occasional video tutorials, exercises and comprehension questions; 100% of the course grade is based on an easy 30 question final exam. The slides are quite informative and the exercises are generally well done, but it would be nice to see more video content. The entire course probably has around an hour to an hour and a half of video. Overall, Introduction to Linux is an informative yet impersonal offering that is more akin to reading an operating manual than taking an open course with human teachers and students. The content will likely prove useful as reference material, but the lack of human connection makes getting through it a bit of a slog.
Intro to sabermetrics is a beginner course in baseball analytics published by Boston University on the edX platform. The course is organized into 4 different content tracks: a statistics track, a sabremetrics track, a tech track, and a baseball history track. The statistics track covers basic statistical concepts like mean, median, measures of spread, regression to the mean and correlation. The sabremetrics track introduces a variety of concepts and computed statistics in baseball analytics like on base percentage, slugging, other hitting metrics and converting runs to wins. The tech track focuses on teaching SQL database queries using an interactive mySQL environment as well R basics. Each of the course's 6 weeks of content start with a brief overview of the material to be covered in each track. SABR101x is a good intro to sabremetrics, but it suffers from several issues common to first run MOOCs that held it back from being a great course. The course has good instruction and the organization of the materials into different tracks was nice to let people focus on areas of interest. On the down side, information in the videos was sometimes hard to make out due small text size and poor color choices with backgrounds and pens. The difficulty level also seemed a bit unpredictable: the statistics track was very basic while the tech track gets into SQL and R at a rate that is probably a bit too fast for people with no background knowledge. In addition, tech exercises sometimes suffered from ambiguous wording and automated graders initially expected too much accuracy on rounded answers. Many of these kinks could be straightened out for a second offering of the course. If you love baseball and have any interest in baseball analytics, you will probably enjoy this course. If you're mainly interested in analytics and picking up new technical skills, the SQL tech sections and SQL sandbox are the highlights of the course.
This is another course that had dozens of reviews before the course started because people were freaking out that materials weren't available at the stroke of midnight the day the course started. The course is just wrapping up today, so don't put too much stock into reviews made more than a few weeks before now. I'll start by saying I don't normally take courses that aren't focused on technical topics like computing, math or science because I don't find many of them to be of practical use. When I saw a course entitled "Becoming a resilient person" I was quite skeptical because it sounded like the sort of fluff course you'd take to pad your schedule in college to raise your GPA. I decided to sign up for it anyway because the time commitment was minimal and I wanted to check out the first week's material to confirm my suspicions. Eight weeks later, I must say I'm glad I took this course. It discusses many important life topics that affect happiness and wellbeing such as values, goals, mindfulness, gratitude, managing emotions, making therapeutic lifestyle choices and making meaningful social connections and takes time to give actionable advice for making positive changes in your life. On the technical side, the lecture videos are well organized, the instructor is always on screen and he delivers information clearly. The materiel is very accessible, so almost anyone could take this course. There are short, easy quizzes each week and homework that usually involves using what you've learned in your life or explaining concepts discussed in lecture to willing ear like a close friend or family member. Given the low time commitment required, this MOOC provides great value. The forums were littered with posts about how the course helped people make positive changes in their lives. There's no guarantee you'll see any positive changes, but for an investment of 8-15 hours, you have a lot to gain and not much to lose.
Practical machine learning is the 8th course in the 9-part data science specialization offered by John Hopkins on Coursera. This course introduces machine learning in R, including the basics of prediction, splitting data into training and testing sets, regression, trees, random forests and boosting all in the span of 4 weeks. The course focuses on using the Caret package in R to apply machine learning algorithms. Similar to other courses in the data science specialization, the course content is mainly static slides with voice- overs, but thankfully the slides are generally not overly cluttered and the voice-overs are of decent quality. The course has a lot of good information on how to use R to apply common machine learning techniques to data, but you aren't going to gain a deep understanding of how the machine learning methods work. "Practical" in this case means "learn how to use the tool, not how it works." I suspect students coming into this course with no prior knowledge of machine learning will find that the lectures jump from one topic to another too quickly as the course goes on. Taking a course that covers machine learning theory, like the 3 part machine learning series from Udacity, will give you a deeper understanding of the methods introduced in this course. Practical machine learning does pretty good job introducing a machine learning topics in a limited amount of time, but the coverage is too brief to gain a solid understanding of many of the methods presented. This course would have been much better if it was 8 weeks and had at least 1 hour of solid lecture content per week with interactive exercises or homework. If you’re looking for an excellent practical machine learning course that spends enough time on each topic and has enough homework to really help students learn, check out MIT's Analytics Edge on edX.
Introduction to databases is a self-paced course offered by Stanford through Coursera that provides a thorough overview of databases focusing on relational databases and SQL. This course was created in late 2011, so the presentation and some of the content is slightly dated, but it is still a very good course. The videos stand up pretty well when compared with modern MOOCs and you'd be hard-pressed to find a more accessible intro to databases elsewhere. The main shortcoming of this course is that only it briefly touches on topics like no SQL systems, big data and map reduce which are areas that have advanced quite a bit in the last few years; these topics would warrant more coverage in an updated course. The content can get a bit slow at times, but that's pretty much unavoidable given the subject. You might not want to go through all the lecture videos depending on your background knowledge and goals; the content is well-organized by subject, making it easy to stick to the topics you want to learn.
Developing data products is the final course in the 9-part data science specialization offered by John Hopkins on Coursera. This course introduces several tools you can use to put R code on the web, into slideshows and into R packages, including Shiny, rcharts, Google Vis, slidify and R studio presenter. Although the course is listed as 4 weeks it only has 3 weeks of lecture content, with one week devoted to giving students time to work on the course project. Unlike previous courses in the data specialization, this course is not taught by a single professor: each of the 3 professors involved in the data science specialization leads a few lectures. This course provides a decent overview of some useful tools for integrating R with the web and in presentations, but it covers too many different tools in too short a time without any exercises to help students practice using the tools presented. You'll have to spend a lot of time on your own exploring the tools discussed to really learn how to use them. It's nice to be aware of the kinds of tools that are out there and have some basic information on each one to get started, but in keeping with the theme of the entire data science specialization, coverage is only skin deep. Now that I've gone through all 9 courses in the data science specialization, I can say that on the whole, the data science track is disappointing. On the plus side, you will gain basic R proficiency if you complete the R programming, getting and cleaning data, reproducible research and exploratory data analysis courses. That said, too much of the material is poorly presented with a lack of instructor face time and overly cluttered slides. The courses routinely try to cover too much materiel too fast and skimp on content in the later weeks. There are no in-lecture quizzes and few interactive exercises or quality homework problem sets. A cynic might question John Hopkins' motivation in offering the data science specialization: making 9 short courses that they can rerun each month and charge $50 a pop to anyone interested in verified certificates smells a bit like an experimental cash grab. Regardless, there are several other MOOCs out there that cover the same topics better.
Regression Models is the 7th course in the John Hopkins data science specialization track on Coursera. This course is essentially identical to the statistical inference course in terms of structure, presentation and quality: the entire course consists of dull, information-packed slides with mediocre voice-overs. It seems like half of the course consists of slides with verbose math expressions in summation notation and the instructor telling you don't really need to understand them unless you are interested in the math behind the models. As with other courses in the track, there are no in-lecture quizzes or interactive exercises and there is no instructor face time. Overall this is a disappointing course that probably won’t keep your interest long enough for you to bother completing all the videos much less the quizzes and the project. If you’re looking for other places to learn about regression models, the last two weeks of Duke's data analysis and statistical inference cover regression, as do the first few weeks of MIT's Analytics Edge. I highly recommend both of those courses. Regression Models does cover regression in a bit more detail, but given the poor presentation you'd probably be better off reading Wikipedia. *Update: John Hopkins has recently released an interactive learning package for R called Swirl that provides a series of exercises for this course and some of their other Coursera offerings. The Swirl exercises for this course help reinforce the topics in a way that is much more engaging than the lectures. I give the Swirl exercises for this course a score of 3/5 stars. It would have been nice if the Swirl package was available from the beginning.
Machine Learning 3—Reinforcement Learning is the final part of a 3 part machine learning course offered by Georgia Tech through Udacity. This course is the shortest of the 3 parts, spanning only 4 lessons that cover markov processes, reinforcement learning in general and game theory. This part is not as quite strong as the previous two parts of the machine learning course because it is too short and spends bit too much time covering basic game theory. There are no homework exercises but there are a few in-lecture quizzes here and there. If you went through the first two parts of the machine learning course, there's no reason not finish it off by taking this part: the professors still have great chemistry and it doesn't take too long to complete. Just be aware that about half of the content is an intro to game theory, so if you've taken a course that covers game theory before, half of this course will be review.
The science of everyday thinking is a fun, light course covering how people think, learn and make decisions. Major topics include illusions and cognitive biases, intuition, learning, experimentation, belief and how scientific thinking can improve decisions. The course consists of 12 modules that each focus on a particular theme with a series of video lectures interspersed with interviews of experts. Each section concludes with a 10-question quiz, an invitation to participate in the discussion forums and a video showing some of the work of on-campus students. The course videos are well done, both in terms of content and the quality of the video footage itself. In most MOOCs I find guest lectures to be of little value: they are usually tacked on as bonus content that is not always directly relevant to the main lectures. This is the first MOOC I've seen where guest speakers fit well into the flow of the main lecture content and enhance the overall course experience. On the downside, I did not care for the quizzes because they ask too many questions that require you to remember specific lines, facts or definitions from the lecture videos. I also felt that the course tapers off a bit at the end: the final 4 weeks were not quite as interesting as the first 8. Still, the science of everyday thinking is a very good course that provides interesting insights into the human thought process without a major time commitment.
I'm not sure how there are already 20 reviews for this course when it wasn't released until today, but I just finished going through the main lectures and I have to say this course is very disappointing. I wasn't really sure what to expect since the course had 1 week listed as the duration: it a short one- sitting type of course that is just one of several installments the lecturer is planning to offer. The main lecture content raises many interesting concepts about big data and social physics but there is not enough content and each section concludes with the professor referencing his book and telling you to go read it to learn more even though the book is "not required." Basically, you’re given a bunch of interesting ideas and visualizations to whet your appetite and then the lectures fade to a picture of the book. I'm not sure whether this course is really trying to teach students or masquerading as a course while functioning as infomercial for the book. It's too bad because it is a fascinating topic and I feel that the lecturer could have made a good course if he took the time to create a 6-8 week MOOC that covered all the material in depth.
Linear Algebra - Foundations to Frontiers is an introductory linear algebra course that teaches linear algebra in the context of computing. If you don't have any familiarity with programming or python, the computing component is going to be hard to follow. You can, however, skip all of the programming parts and just go through the lecture videos and quizzes. Topics include vectors, linear transformations, matrix vector operations, matrix multiplication and inversion, vector spaces, orthogonal projection and bases and eigenvalues and eigenvectors. LAFF requires a major time commitment. Unless you are already familiar with some of the topics, you'll probably spend 5-8 hours a week. It is clear that a tremendous amount of effort went into producing the materials for this course. There are multiple homework exercises after almost every video and most weeks have one or more programming exercises where you implement and visualize linear algebra functions using tools the instructors have created. The instructors were also active on the forums, which was nice to see. If I were to judge this course solely on the amount of content and quality of exercises, it would be 5/5. That said, I didn’t find the instructor engaging on a human level. Math can be boring; instructors that are excited about the topics they teach can go a long way toward mitigating the dryness. The instructor was robotic in his presentation and I often found the lectures hard to follow. When I decided to watch some of Salman Khan’s linear algebra videos on Khan Academy to review for the final, I found his presentation of the same concepts more engaging and easier to understand. I came out of this course feeling like I didn’t learn as much as I could have because the material is not always presented in a way that is easy to follow and my interest waned from time to time. LAFF provides everything you need to build a solid foundation in linear algebra—if you are able to remain attentive despite the dry presentation.
MIT’s The Analytics Edge is a course focused on using statistical tools to gain insight about data and make predictions. The majority of the course teaches analytic methods using the R programming language, but the final 2 weeks deal with solving optimization problems using spreadsheet software (LibreOffice or MS Excel). The course runs 11 weeks and covers R basics, linear regression, logistic regression, decision trees, text analytics, clustering, visualizations and both linear and integer optimizations. The Analytics Edge is a meaty course. It has a lot of content each week and it’s not easy to breeze through things like it is with many other MOOCs. There are graded quizzes after each video lecture and each week of new material has 4 fairly lengthy case studies to complete. One week is devoted to an analytics competition while the final week is reserved for a 4 part final exam. Some students on the forums claimed they were spending 10 to 15 hours a week on this course. Coming into the course with basic knowledge of statistics and R helps a lot. It should be noted, however, that this course is not too math intensive. It doesn't spend a lot of time talking about formulas or nitty- gritty mathematical details; it mostly teaches you how to apply statistical functions and methods and interpret the results. Although this course requires a serious time commitment, it is time well spent. The Analytics edge is an excellent course that teaches a bunch of practical statistical tools and actually gives you enough practice using them through the lengthy homework exercises to gain some confidence with them and remember how to use them. Too many courses info dump syntax and concepts, but don’t back them up with practical problems to let you use what you've learned. The homework problems for this course are very well crafted and look at a variety of interesting data sets from basketball stats to tweets about Apple. I can’t even imagine the amount of time that went into putting all the homework exercises together; kudos to the team at MIT for their hard work. If you’re interested in learning some practical analytic methods that don’t require a ton of math background to understand, this is the course for you.
Reproducible research is the 5th course in the John Hopkins data science track on Coursera. As the title states, this course is all about making research and data analysis reproducible using the R programming language. The first 2.5 weeks of lecture material in this course is great. It provides a well- organized overview of how to create reproducible research in R using R markdown and the knitr package, taking plenty of time to talk about best practices. Thankfully, Roger Peng has added in a little box with his face in at as he talks over his slides for many of his videos, which makes the content a lot more engaging than it is in some of the other John Hopkins courses that only have voiceovers. The final 1.5 weeks of lecture video material is not as useful or engaging and seems a bit lazy in that week 4 takes the form of recordings of lectures given sometime in the past. The videos in second half of week 3 only have voiceovers and they have an echo to them that makes them hard to listen to. All in all, the first 2.5 weeks of this course are definitely worth checking out if you have any interest in learning about reproducible research but you might want to skip through some of the content at the end of the course.
Intro to Object Oriented Programming is a short overview of object oriented programming using the Python programming language. This is a beginner level course but it assumes you have a basic grasp of programming in python. It would be a good course to take after completing the first few weeks of an introductory python course like Udacity's CS 101, Rice's "An Introduction to Interactive Programming in Python" on Coursera or MIT's "Introduction to Computer Science and Programming Using Python" on EdX. Intro to OOP provides a gentle introduction to using classes in python that starts by building up your confidence with creating programs with simple, yet interesting examples like drawing lines, sending text messages and filtering messages for profanity. The instructor uses built-in python class objects to introduce the concept of classes before having students create their own classes. In the final section, you'll use classes to make a basic movie website that plays trailers for your favorite movies. The course touches briefly on some advanced topics in object oriented programming like inheritance and method overriding. This course is very well organized and the instructor explains OOP concepts in a way that makes them easy to understand. You'll also learn about the structure of python programs so that you better understand where functions and classes reside in python and its modules. The instructor frequently refers to the python docs, stackoverflow and Google to figure out how to do new things, which are good skills to learn. Overall this is a great little course that could take anywhere from 5 to 15+ hours depending on your experience level and how much time you want to spend working on projects.
This class provides an overview of Salesforce, a software platform that lets you create applications in the cloud. Salesforce lets different groups of people like managers, employees, contractors and customers interact through web applications and gives the user a variety of tools for aggregating, crunching and displaying data. The course is presented in a novel format: Udacity's representative Andy assumes the role of the student and learns all the material at the same time as online learners. A representative from Salesforce, Samatha, teaches Andy step by step, which makes it easy for an online learner to follow a long and do everything that Andy is doing. I found that the format worked quite well. My main gripe with the course is that they could do a better job explaining how you would actually deploy and use a Salesforce App in real life. It would be nice if the course was a bit longer or if there was a follow up class that goes into more detail about using Salesforce and advanced features. I feel like they breezed through all the basics and just when things starting getting really interesting, the class ended.
I signed up for this course just to see what it was like, not expecting to actually complete it. The only reason I completed it that there's only about 30 minutes to an hour of content per week. While the course does introduce some interesting combinatorial games and concepts, the content is thin and the professor is not always easy to follow. It's tempting to give the course a higher rating simply because I find the subject enjoyable and the professor is amiable, but that doesn't make up for the lack of content and lackluster presentation.
Exploratory Data Analysis is the 4th course in John Hopkins’s data science specialization track. I'm writing this review after completing all the lectures and quizzes; I'm not planning to complete the projects. The first 2 weeks of this course provide a thorough overview of plotting in R using the base graphical package, the lattice package and the ggplot2 package. Week 3 takes a sudden detour into data clustering and the fairly advanced topics of principal components analysis and single value decomposition only jump back to plotting with a section on color. The clustering section seems a little about of place since there is not any introduction explaining the purpose of clustering. What's worse the SVD and PCA sections require a fairly high level of linear algebra knowledge to understand, which are not prerequisites for this course. I suspect that section will leave may students scratching their heads. Week 4 consists of 2 case studies where the professor shows you how to perform an exploratory analysis on a couple different data sets. If this course only consisted of the plotting lectures I’d give it a 4 out of 5. The plotting lectures that make up the bulk of the course are well done and this course provides more instructor face time and live examples in R than any of the 3 courses in the first wave of the data science track. Unfortunately, there are no interactive exercises or in-lecture quizzes and the principal components analysis and single value decomposition sections are too advanced for this course. It would have been better if they left the SVD and PCA functions as black boxes in R and simply explained in general terms what they do and how to interpret their output. Still, the quality overview of R plotting makes this course worth a look.
This course is a perfect example that having smart instructors who are passionate about what they are doing is not enough to make for good instruction or a good class. Udacity's course offerings are generally top notch in quality, but this one seems to be the lemon of the lot. The course is structured around an HTML5 game that the profs created and quizzes are centered around having you fill in bits of code into a skeleton of hundreds of lines of their game code. The video lectures are too brief and don't discuss commands at a pace that allows students to learn what they are doing before taking quizzes expecting them to use those commands. Using an already-made game is a poor instruction decision. Building something from the ground up, piece by piece, over the course of a class is a much better system for learning that doesn't confuse students with tons of lines of unfamiliar code. The profs seem to assume that students should know much more than they actually would having watched the video lectures. Picture a bunch of scientists who are so wrapped up in their own world that they are unable to explain things in terms that a novice can understand. I love Ucadity, but this is one to avoid.
R Programming is a remake of Computing for Data Analysis, another course offered on Coursera by the same instructor. This course covers R basics such as R data types and objects, reading and writing data, control flow, functions, scoping, dates, loops, debugging tools, simulation and code profiling. The slides and lectures are a bit smoother than Computing for Data Analysis but the content is mostly the same. This course has good information but suffers from a lack of instructor face time and heavy use of static slides with voiceovers, which are less engaging than videos of instructors actually running the commands they are talking about. Additionally, there are no in- lecture quizzes or interactive exercises to help you absorb the material as you go along. If you want to get as much out of the course as you can, I recommend that you follow along with R Studio open on a second screen or window and try out commands discussed as you watch the videos. Overall, this is a decent intro to R, but it is not particularly engaging. Try R from Code School is a much more engaging, albeit brief, intro. If you take this course and want to apply what you've learned or want to learn R somewhere else consider MIT Analytics Edge on EdX, Duke’s Data Analysis and Statistical Inference on Coursera and Exploratory Data Analysis on Udacity. Each of these courses teach R basics in the context of learning other things like predictive modeling, statistics and data analysis.
Duke’s Data Analysis and Statistical Inference is an introduction to statistics with an optional computational component using the R programming language. The course runs about 8 weeks and covers a considerable amount of ground in that time. It starts with the basics of data and data collection methods but quickly moves on to cover probability, the normal distribution, the binomial distribution, hypothesis testing, confidence intervals, Z and T statistics, ANOVA and Chi squared tests and linear regression. The course is a bit of a whirlwind tour that packs a lot into each lecture. The PDF slides that go along with the videos are a great resource to review the information dumped in each lecture. Many students complained that the course requires more time than the original estimated amount of around 6-8 hours per week. The course was later updated with an estimate of 8-10 hours per week, which is on the conservative side. If you come in with some prior knowledge of stats and R you can get through in 3-5 hours per week. The professor is engaging and does a good job going through the material while providing adequate face time. The slides are very informative and the video quality is excellent. There are periodic in-lecture quizzes that help test your understanding of the material as you go along. I felt that the frequency of in-lecture quizzes was just about right in this course. Grading is based on performance on weekly quizzes one midterm and one final exam. You need a cumulative grade of 80 percent or more to get a certificate and you only have 1 attempt on the exams, so it is a bit harder to earn a certificate in this course than it is for most MOOCs. If you choose to go the computational route, a portion of your grade is based on 8 programming labs using the R programming language. You can do the labs on your own or use a convenient web-based programming environment provided by the instructor. The labs provide a basic introduction to R and each one explores some of the concepts introduced in the lectures. The labs take about 30 minutes to an hour and a half depending on your level of experience with programming and R. In the computational track you’ll also complete a final project involving a statistical analysis of two variables, either from a data set provided by the instructor or a data set you find on your own. The project lets you use the concepts you’ve learned both in class and in lecture on your own. I suspect the project is a bit intimidating to those who are new to R because it involves more computation than the labs and you don’t have the training wheels that the labs provide. The project grade is based on the median score of 3 or more peer assessments. This is a great course for anyone looking to learn statistics that moves fast enough not to bore those who know a bit of statistics coming into the course.
Introduction to Mathematical Thinking is a great course that covers several topics that are often not covered in high school math including proofs, logic, quantifiers and beginning real analysis. The professor does a good job engaging students with material that is quite dense, with a lot face time, encouragement and walkthroughs of solutions and proofs. I didn't anticipate actually completing the entire course when I signed up; I did mainly because the professor is so good. The course also includes some interesting supplementary material about the pros and cons of MOOCs.
This course begins with an introductory week about social goods and then goes on to cover 4 major problems the world faces over 4 weeks: poverty, climate change, disease, and gender inequality. The course is basically 2-3 hours of lecture per week with some writing assignments. The class does a good job covering some of the major issues the world faces and outlining some of the things people can do to try to solve them.
Gamification provides an overview of using game elements in non-game contexts. The primary purpose of gamification is to give people extra encouragement to do something that they may not be adequately motivated to do on their own. For instance, a business that wants to improve the overall health of its employees to reduce health care costs might introduce a gamified system to encourage workers to exercise. The course does a good job summarizing the basics of gamification, such as how gamification can be useful, how gamification differs from games, elements of gamified systems, motivation and psychology of players and limits to gamification. It also lays out a basic design framework for creating gamified systems, covers basic design choices, risks and possible legal concerns. Gamification is an easy, fun course, that benefits from quality instruction and an interesting topic. You’ll come away from this course with a solid understanding of what gamification is, how it can be employed and how to think about designing gamified systems. Afterward, you’ll probably start recognizing gamification all around you. On the downside, the material does not get particularly deep. Oftentimes the class feels like it is providing structure to the ideas that you probably already have about games and gamification, rather than presenting new insights. The course focuses a lot on the organization and formalization of the knowledge and intuitions you probably already have. This is not necessarily a bad thing, there just aren’t’ any “Aha” moments where you learn something particularly insightful that changes your way of thinking. Your experience may vary. Another quibble I had with the course is that 35% of the grade is based on peer assessed writing assignments. Students seemed to stray from using the rubric and instead assigned grades based on their own subjective opinions of whether they liked your ideas or not. Overall, this is a good course, but if you want a certificate you should plan to score 90% or more on all the quizzes and the exam just to be safe.
Programming languages uses the goal of writing web browser as a platform to teach topics related to writing programming languages. The class covers the process of lexxing strings of HTML to transform it into sequences of tokens and then parsing those tokens into syntax trees that can be passed to an interpreter to display the web page represented by the HTML. Wes Weimer is a good teacher and brings a fun attitude and some cringe-worthy jokes and drawings to the table. He has a habit of throwing in random historical and other educational tidbits to lectures, which can be good or bad depending on your mood. His wit helps to mask the dryness of the material and the fact that it may not be especially useful to you unless you plan to build a language, a browser, a parser, etc. yourself. It is good, however, to have a basic understanding of how computers process language and certain topics like regular expressions and list comprehensions are very useful outside of the context of this course.
A short intro to Hadoop and MapReduce. Similar to the design of everyday things course, the course was so short that I did not feel motivated to do the fairly involved final project.
Computing for Data Analysis is an introduction to the R programming language for people who already know how to program. The course description makes it seem like the class is intended for everyone, even those who do not know how to program at all; this course is not designed for people with zero programming knowledge. The lectures move through material at a rate and level of sophistication that assumes prior programming experience. You might be able to get through this course without prior programming knowledge with a lot of extra work, but doing so would be an inefficient use of your time. If you have no prior experience, a true introductory class would be a better idea. The course provides a decent overview of R, but the format is not ideal. The lectures are generally 10-25 minutes with no interactive programming exercises to do as you go along. It’s a good idea to follow along and do the commands he talks about on your own so that you at least get some practice. There’s some good material in the lectures, but they leave a lot out as well. You’ll probably end up spending a substantial amount of time Googling about basic R functions to complete the programming assignments.
This is a great little course on developing for the mobile web with fluid, adaptive and responsive design. It will take around 1.5 to 3 hours to complete depending on your experience level and how much you go back and look through the material to answer the challenge questions. You should have basic knowledge of CSS and HTML before taking this course. I must say I continue to be impressed by the quality of the material on Code School. I've done courses on just about every online learning site there is--Codeacademy, Learn Street, Khan Academy, Udacity, Coursera, EdX, etc.--and I find Code School's materials to be among the most accessible and polished as an overall web experience. Everything from the videos to the slides and exercises are well done and relevant and most importantly, they are explained and laid out in a way that makes it very easy to understand. They also have a good hint system keeps you from getting stuck too long. A+
I believe the first running of the class was in late 2012, so the content is still quite current. The course lasts 12 weeks and walks you through a wide range of major topics related computer networks work from the physical layer of sending signals on wires or through the air to network security and quality of service. The class provides insight into how many things you likely use every day actually work, like Ethernet, Wifi, routers, switches, hubs, virtual private networks, content distribution network, peer to peer services, and of course, the domain name system and the Internet itself. The lectures go into a fair bit of technical detail about how different aspects of computer networks function. In some cases, the extra detail is enlightening it can get a bit tedious. Overall, the class was definitely worth taking, even though it does not require any programming. I'd recommend this course to anyone that wants to learn how computers networks work in more depth than you'd gain in your everyday life as a web user.
Design of computer programs is an awesome class for a novice to intermediate python programmer to learn some new tools and techniques. The biggest problem with this course is that Udacity originally promoted it as being the next step after CS 101. As a result, the initial offering of the class had a ton of newbies who quickly got lost in the new topics that Prof Norvig introduces in relatively rapid succession. Udacity has since recategorized the class as “advanced.” I’m not sure I’d necessarily call it advanced, but you should probably have more than just an intro course under your belt before attempting it unless are willing to work hard and slowly. You will do a lot of programming in this course, mostly in the context of creating and solving games like scrabble, boggle and poker. Topics and techniques covered include Python list comprehensions, generators, decorators, tuple unpacking, lambda expressions, regular expressions, testing, profiling and optimization. If that sounds like a lot to cover in a 7 week course, you’re right. This course is a lot of work and covers a lot of ground, but it will also teach you a lot.
Udacity's intro CS class is one of the best CS intros on the web. I've taken MIT 6.00, Harvard CS50, gone through Coursersa and LearnStreet intro courses and I'd say this one is the best in terms of actually learning how to program. The format of short instructional videos and quizzes on Udacity is the best format for learning CS on the web, when executed well (other than building/researching things yourself.). It should be noted that this course focuses mainly on learning Python and not on the theory of CS. I think for an intro CS class it’s okay to focus more on gaining confidence with the basic nuts and bolts of a language than actually getting into the nitty-gritty of CS itself. Intro courses offered by universities get more into CS theory, but spend less time on teaching you how to actually program, which can make them a bit frustrating and leave students feeling like they have to educate themselves on the programming side of things. This course is entirely self- contained: you don't need to go anywhere else or learn on your own to get through it. It also doesn't take too long to complete, so it is a perfect precursor to more theory-heavy classes that don't spend enough time on the nuts and bolts.
Neural Networks and Deep Learning is the first course in a new deep learning specialization offered by Coursera taught by Coursera founder Andrew Ng. The 4-week course covers the basics of neural networks and how to implement them in code using Python and numpy. The course page states that it only requires basic Python programming knowledge, although any experience you have with machine learning, linear algebra and calculus will be helpful with gaining a deeper understanding of the material. You can access the quizzes and programming assignments without paying for the full course, but if you want to submit them for grading and get credit as having completed the course, you have to pay for the certificate. Neural Networks and Deep Learning starts with a short introduction to deep learning in week 1, followed by 3 full weeks that build your understanding of neural networks by starting with logistic regression implemented with the same structure as a neural net in week 2, shallow nets in week 3 and deep nets in week 4. Key topics include computational graphs and derivatives on graphs, gradient descent, vectorizing code, neural network representations, activation functions, backpropagation and deep nets. The course touches on high level concepts and considerations to frame learning, but the majority of the content focuses on the low-level nuts and bolts of neural network structure and how to translate it into code. Each week after the first has roughly 1-2 hours of lecture split up into 5 to 15 minute video segments. In each segment, Andrew Ng appears on screen and gives a brief overview of what the the video is going to cover and then he discusses the topic with voice-overs while writing on white slides, followed by a brief outro where he reappears and summarizes key takeaways. There is a lot of handwritten information and notation in the lectures, which means some students may find certain lectures difficult (or boring) to follow, but he explains things very well and the notation is there to help you gain a concrete understanding of the structure of neural nets and prepare you for working with them in the programming assignments. The production value of the videos is fairly low as the intros and outros seem to be recorded with a non wide screen SD camera and the vast majority of content is simply Ng writing on mostly blank slides. The production style is reminiscent of his original machine learning MOOC which was released back in 2012. Still, the logical organization of the content combined with Ng's masterful knowledge and lucid explanations means the relatively rudimentary production doesn't detract from the course's value. Weeks 1-3 also include an optional guest lectures with different "heroes of deep learning." The programming assignments in Neural Networks and Deep Learning are very well done, providing great instructions, explanations and examples. You can access all of the assignments as a freeware student, so even though the course won't be listed as completed when you finish, you can still work through them and learn all the same things as paying students. The assignments are heavily structured, giving students complete code skeletons of all required functions and only requiring students to implement specific key lines of code which are described in detail. In other words, most of the difficulty in implementing neural nets--such as the logic and structure of the code and aligning matrix dimensions--is taken care of for you so you don't need to be a strong programmer to complete the assignments. This keeps the assignments moving along at a nice pace and should help keep students from getting stuck for too long and while you may struggle to implement neural nets from scratch yourself after completing this course, it shows you the tools you would need to do it. And perhaps more importantly, it gives you insight into how neural nets are working under the hood, which is good to know even if you end up using a package to build them. Neural Networks and Deep Learning is the best introductory course on neural networks on any of the main MOOC platforms that is accessible to about as broad a group of students as possible given the nature of the material. The course isn't perfect: notation-heavy videos can get tedious and it sometimes eschews mathematical details. It also makes a few questionable decisions such as putting a 40 minute interview of Geoffrey Hinton at the end of the first week, most of which you will not understand unless you've seen neural networks before and have familiarity with his work. That said, if you want to learn about neural networks and how to make them in code, this is the right place to start. I give Neural Networks and Deep Learning 5 stars out of 5: Excellent.
Machine Learning: Clustering & Retrieval is the fourth course in the University of Washington's 6-part machine learning specialization on Coursera. The 6-week course covers several popular techniques for grouping unlabeled data and retrieving items similar to items of interest. After a short intro in week 1, the course covers k-nearest neighbor search, k-means clustering, Gaussian mixture models, latent Dirichlet allocation and hierarchical clustering. It is recommended that you complete the first 3 courses in the specialization track before taking this course, but you could take it as a standalone course as long as you know a bit of Python and probability. Grading is based on a series of comprehension quizzes and labs, but you must pay for a verified certificate to gain access to graded assignments. Thankfully you can still download and complete the labs without doing the associated quizzes, so you won't miss too much as a freeware student. Clustering and Retrieval has a good balance of lecture content and labs that illustrate concepts covered in lecture. The professor is easy to understand and the lecture slides and are well done. The course generally has good pacing and devotes plenty of time to each of the main weekly topics, taking care to explain important considerations like different algorithmic approaches to each method and similarities between different techniques. It does, however, go off on a couple tangents, introducing map reduce and hidden Markov models, neither of which are covered in much detail or addressed in the labs. The labs use a data set of Wikipedia articles about famous people as an example to illustrate clustering and retrieval. Using the same data set for multiple labs is always a good idea because it lets students focus on the techniques themselves instead of having familiarizing themselves with new data. The amount of actual coding you have to do in the labs is minimal. The labs are more like interactive explorations of machine learning techniques with occasional one-line fill in the blanks than full-on coding assignments. You'll spend more time reading text, running provided code and analyzing results than writing code yourself. You can look at and answer the lab quiz questions as you go along but you can't actually submit them and get graded feedback without joining the verified track. Machine Learning: Clustering & Retrieval is a great course that covers the many most common clustering techniques with adequate depth while remaining accessible. Although the coding required is minimal, it is not an easy course: some of the concepts may take a couple watch-troughs to sink in and you may struggle with certain concepts if you don't have prior knowledge of probability. Aside from the need to pay to gain access to graded quizzes and few topics that felt tacked on, there's not much to dislike about this course. I give Machine Learning: Clustering & Retrieval 4.5 out of 5 stars: Great.
Machine Learning: Classification is the third course in the 6-part machine learning specialization offered by the University of Washington on the Coursera MOOC platform. The first two weeks of the 7-week course discuss classification in general, logistic regression and controlling overfitting with regularization. Weeks 3 and 4 cover decision trees, methods to control overfitting in tree models and handling missing data. Week 5 discusses boosting as an ensemble learning method in the context of decision trees. Weeks 6 and 7 cover precision and recall as alternatives to accuracy for assessing model performance and stochastic gradient ascent to make models scalable. The course builds on the concepts covered in Machine Learning: Regression, so it is highly recommended that you take it first. Assignments use GraphLab, a Python package that requires the 64-bit version of Python 2.7. You can technically complete the course with whatever language and tools you like, but using Python and GraphLab will make your life much easier because the assignments are designed around it. Like the previous course, basic knowledge of Python, derivatives and matrices is recommended, but course doesn't get too deep into math. Grading is based on weekly quizzes and programming assignments. Machine Learning: Classification follows in the footsteps of the regression course, offering a good mix of high quality instructional videos and illustrative programming assignments. Carlos Guestrin takes the reigns in the course (Emily Fox, the professor for the regression course, does not make an appearance) but the presentation format and style are mostly unchanged: videos break topics down into well-organized and digestible 1 to 7 minute chunks. The slides are crisp and generally uncluttered. Some of the most complicated sections are optional, so you can skip them without it affecting your performance on the programming assignments and quizzes. The programming assignments are provided in Jupyter notebooks--interactive text and code documents that run in your browser. They do a good job illustrating the concepts and walking you through the process of implementing machine learning algorithms. Although the course claims that you'll be implementing algorithms yourself from scratch, they provide a ton setup, support and skeleton code: you don't need to define a single function yourself. Instead, you follow along with instructions and fill in key pieces of code in the bodies of certain pre-defined functions to get things working. Essentially every line of code you need to write has a comment giving you the gist of what you are supposed to do. Some may not appreciate this degree of hand-holding, but it keeps the assignments moving along steadily and puts the focus on learning and understanding concepts rather than coding details and debugging. My only major gripe with this course is with some of the decisions concerning which topics to cover. The course mentions random forest models briefly at the end of the section on boosting, but the topic warrants a little more detail. A single 5-8 minute video would have been enough. The course does not mention support vector machines at all. The professor stated in the forums that he may release some videos on SVMs in the future but they were not included at launch since they are more complicated than other models and do not scale well to large datasets. The section on decision trees only discusses missclassification error as a metric for splitting, failing to mention information gain or gini impurity, which are often preferred in practice. Similarly, the boosting section focuses on AdaBoost, while stochastic gradient boosting and xgboost in particular are often more successful in practice. The final week's title "scaling to huge data sets and online learning" is a little misleading because it only really covers stochastic gradient ascent and mini-batch gradient ascent. Machine Learning: Classification is a great first course for learning about classification that benefits from good organization and illustrative programming assignments. The course, does, however eschew some important topics in favor of simplicity; including a few more optional videos covering these topics would give the course the breadth and depth advanced learners desire without harming its accessibility. I give Machine Learning: Classification 4.5 out of 5 stars: Great.
Udacity's "Deep Learning" is a 4-lesson data science course built by Google that covers artificial neural networks. The first lesson builds up some machine learning background on classification problems, while lesson 2 discusses the basic machinery of neural networks and deep learning (neural networks with multiple layers.). Lesson 3 covers conventional networks for image recognition and lesson 4 covers recurrent networks and issues dealing with text data. This course assumes you have intermediate Python programming experience and basic knowledge of machine learning, statistics, linear algebra and calculus. Each lesson in the course consists of a series of short video lecture segments with occasional comprehension questions and breaks to apply topics discussed in programming assignments. The video quality itself is good and the lecture quality is adequate, but the lecture segments are very brief, with most lasting around a minute or less. The sum total of the video content in the third lesson on convents is less than 15 minutes. The programming assignments, which use a popular neural network library called TensorFlow, are lacking in instruction and involve either running large chunks of provided code or working on open-ended questions. You likely won't be able to make much progress on the assignments without prior knowledge of machine learning and TensorFlow or doing a lot of extra research outside of the course materials. The programming problems also require significant computing resources; my laptop with 8GB of RAM ran out of memory when running the provided code in the first assignment. Deep Learning is a shallow course that is akin to reading CliffsNotes instead of a textbook: you'll learn some terminology and be exposed to some interesting concepts but its abbreviated coverage is likely to confuse students who are new to neural networks while leaving more experienced students unsatisfied. This course seems like a rushed attempt to capitalize on the hottest buzzword in the hottest tech industry, which is a shame because it could have been a good course if it took the time to cover the topics in adequate detail. I give Deep Learning 2 out of 5 stars: Disappointing. *If you're interested in learning about the topics this course introduces in much more depth, check out the video lectures and course materials for CS231n, a deep learning course focused on image recognition offered by Stanford.
Managing Big Data with MySQL is the fourth and final course in Duke University's Excel to MySQL: Analytic Techniques for Business specialization offered through Coursera. The 5-week course focuses on teaching students how to make relational database queries. Unlike some database courses that delve into details concerning database construction and theory, this course is all about the practical use of databases from the perspective of a business analyst. The first week introduces the concept of relational databases, entity relationship diagrams and schema, while the remainder of the course covers querying from simple select statements to summary functions, grouping, joins and subqueries. You don't need any particular background to take this course and it could be taken in isolation from the rest of the specialization. Grading is based on 4 week-end multiple-choice quizzes. Weekly course content is divided into several lessons that typically involve watching a short video segment and then working through an exercise set in MySQL or Teradata, two relational databases used in the course. The lecture content is high quality but after the first week, you'll be spending most of your time working on exercises rather than watching videos. In fact, some lessons don't have video lectures at all: the written exercises are really the core of the course. The MySQL exercises are contained in Jupiter notebooks--interactive text and code documents--that let you read instructions and play around with code in the same place. The exercises provide plenty of opportunity to drill SQL queries and build SQL vocabulary. The answers to exercise questions are provided in PDFs (they are ungraded), which means you can skip ahead if you don't need more practice. Considering each week after the first has at least 3 exercises sets plus a quiz, each of which could take a few hours to complete in their entirety, consulting the answer keys frequently is recommended to keep things moving along at a reasonable pace. At the end of each week after the first you'll do a final exercise set using Teradata and answer multiple choice quiz questions based on your results. You use the same real-world data set for each quiz--product information from Dillard's department stores—helping you build some familiarity with the data by the end of the course. The final week of the course doesn't cover any new material: it just contains the final quiz. Managing Big Data with MySQL is a great course for learning practical relational database querying skills with plenty of exercises that let you interact with real-life data sets. The focus on drilling ungraded exercises combined with sparing use of lectures after the first week does, however, make the course feel impersonal. It plays out more like a collection of training materials than the sort of university-style course you may expect from Coursera. I give Managing Big Data with MySQL 4.5 out of 5 stars: Great.
Managing Data Analysis is the third course in “Executive Data Science” specialization offered by John Hopkins University on Coursera. The one-week course discusses the process of data analysis at a high level from formulating questions to exploratory analysis, inference, modeling and communicating results. Grading is based on several short comprehension quizzes. The lectures in Managing Data Analysis are of good quality and the instructor is generally easy to understand. The lectures do, however, use some jargon and concepts that aren’t always adequately explained. Unlike the first two courses of the specialization, which are geared toward managers, this course is more geared toward people who are actually going to be conducting data analysis. The concepts in this course are definitely important for data science managers to understand, but non-technical students may find this to be a jarring change of pace. In addition, certain parts may be confusing if you have had no prior exposure to statistics or machine learning other than the first two courses of this specialization. Managing Data Analysis provides a useful overview of the process of data analysis, but it is taught at a level appropriate for data analysts. “The Data Analysis Process” would be a more appropriate name for this course. I give Managing Data Analysis 3.5 out of 5 stars: Good.
A Crash Course in Data Science is a succinct, one-week overview of the field of data science produced by the same team from John Hopkins University that produced Coursera’s data science specialization. It is the first course in the “Executive Data Science” specialization, a data science track aimed at non-technical people like business managers. The course defines data science and then discusses different aspects of data science like statistics, machine learning and the structure, output and success metrics for data science projects. Grading is based on a handful of short multiple-choice comprehension quizzes. A Crash Course in Data Science is good for what it is: a brief overview of a field taught at a high level so that anyone can follow along. The professors have plenty of face time, explain concepts well and the video quality is good. The content quality is a definite step up from the original John Hopkins data science track. The only real knock against this course is its brevity and the fact that it costs the full $49 to get a verified certificate if you want to complete the specialization. A course that you can complete in an hour or two should not cost the same as a month-long course. Students looking to dig their teeth into something substantial for the first month of the Executive Data Science specialization may be disappointed. A Crash Course in Data Science is a well-made primer on the data science field, but its brevity may leave paying students wanting. For freeware students I give this course 4 out of 5 stars: Very Good.
Data Science in Real Life is the fourth and final course in the “Executive Data Science” specialization offered by John Hopkins University on Coursera. The one-week course examines various steps in the data analysis process and contrasts ideal outcomes against the outcomes you are likely to experience in reality. Grading is based upon a few short multiple-choice quizzes. The lecture videos are crisp and the professor does a good job explaining the topics without being overly technical. It does discuss some topics that you won’t fully appreciate without having hand-on experience doing data science projects, but it will help prepare you for some of the problems you might encounter. Like other courses in the Executive Data Science track, there’s not too much to dislike about this course other than its brevity and the limited depth at which topics can be covered in a one-week course. Data Science in Real Life is nice, succinct overview of many of the challenges you are likely to face in data projects and suggestions for overcoming them. It is raises considerations that could be useful for both data analysts and managers. I give Data Science in Real Life 4 out of 5 stars: Very Good.
Building a Data Science Team is the second course in “Executive Data Science” specialization offered by John Hopkins University on Coursera. It is a one-week course that defines the different data science roles in an organization, what to look for in data scientists and strategies for managing and communicating with data scientists. The course has no prerequisites and grading is based on a handful of multiple-choice quizzes. The content in Building a Data Science Team is similar to the first course in the specialization: it is geared toward a non-technical people who have to manage data scientists. The video quality is good and the instructor is personable, easy to understand and knowledgeable. There’s not too much to dislike about this course apart from its brevity. All of the courses in the Executive Data Science track are only a week long, so they can be completed in one or two learning sessions. This is not necessarily a bad thing: I find it refreshing to get a high-level overview of a topic in a short course, but it may not deliver the amount of content that paying students expect. Building a Data Science Team is a good course for what it is: a succinct primer how to assemble and manage a data science team. I give this course 4 out of 5 stars: Very Good.
Foundations of Strategic Business Analytics is the first course in the “Strategic Business Analytics Specialization” offered by ESSEC business school on Coursera. The 4-week course covers data analysis topics including clustering, exploring relationships between variables, forecasting and communicating results. All discussion is geared toward a business context, so the focus is on producing clear, actionable insight instead of looking at low-level details. The course uses the R programming language for analysis; basic familiarity with R is assumed. Grading is based on 3 quizzes and a peer-graded assignment. Each week consists of two main content sections: a lecture section that introduces concepts and data analysis techniques and then a recital section that teaches you how to use the methods discussed in lecture in R. The lecture videos themselves are polished with nice text graphics. The lecturer’s English takes a little time to get used to, but he speaks clearly and he does a good job framing each topic in the context of business. The programming recitals are easy to follow and let you get some hands-on experience with lecture topics right away. Foundations of Strategic Business Analytics is a nice introduction to thinking about data analytics in a business setting, but it is too short. Follow-up courses will hopefully let you dig your teeth deeper into the material. Also note that the specialization is listed as “Advanced”, but this course is not very technical and only really requires basic R knowledge as a prerequisite. I give Foundations of Strategic Business Analytics 3 out of 5 stars: Okay.
Machine Learning Foundations: A Case Study Approach is a 6-week introductory machine learning course offered by the University of Washington on Coursera. It is the first course in a 5-part Machine Learning specialization. The course provides a broad overview of key areas in machine learning, including regression, classification, clustering , recommender systems and deep learning, using short programming case studies as examples. The course assumes basic Python programming skills and it uses a software package called GraphLab that requires a 64-bit operating system running Python 2.7. Grades are based on periodic comprehension quizzes and short programming assignments. The course covers a broad range of machine learning topics at a high level with the promise of drilling down into the details in future courses in the specialization. The lecturers have good chemistry, but they tend to get distracted when they are on screen together. The video and slide quality are very good and although the delivery is a little rough around the edges at times, the lectures are informative. The machine learning methods covered aren’t necessarily treated as complete black boxes, but the course intentionally avoids getting too deep into the details, putting the emphasis on conceptual understanding. The weekly labs are contained in short IPython Notebooks—interactive text and code documents rendered in a web browser—that illustrate some simple models in GraphLab. The labs themselves are easy and don’t require much coding other than calling various built in GraphLab functions. The hardest part about the class is getting your programming environment set up in the first place. If you don’t have a new version of 64-bit Python 2.7, you can’t run GraphLab. It is relatively easy to get set up if you can use the recommended Anaconda Python distribution, but getting things set up manually on an existing Python installation may prove troublesome. The instructors provided some workarounds for doing the course without GraphLab or using GraphLab on Amazon’s cloud computing service; I wouldn’t take the course without getting GraphLab working in some form. Many students decried the use of a non-open source package for an open class; I think it is useful to be exposed to new tools and GraphLab seems cleaner than Python’s popular scikit-learn package. In this sort of course, the focus should be one concepts rather than syntax. Machine Learning Foundations: A Case Study Approach achieves its goal of introducing machine learning at a high level without rushing or trying to cram too much into any particular week. What the professors lack in terms of polish they make up for with enthusiasm. Compatibility and setup issues will be a roadblock for some, but overcoming them is worth it. I give Machine Learning Foundations: A Case Study Approach 4.5 out of 5 stars: Great.
Data Visualization and Communication with Tableau is the third course in Duke University's "Excel to MySQL: Analytic Techniques for Business" specialization offered on Coursera. The 5-week course starts is essentially an introduction to Tableau (weeks 2 and 3) book-ended by some lectures on considerations and best practices for communicating data insights in a business setting (weeks 1 and 4.). The final week is devoted to a peer-reviewed assignment and has no new lecture content. The course provides you with a free temporary license for the desktop version of Tableau. You can get through his course without any background knowledge, although some knowledge of MS Excel will help you appreciate some of the comparisons it makes. Grading is based on 4 weekly quizzes and a peer graded assignment. Data Visualization has quality lectures that do a good job introducing Tableau in the context of creating visualizations for a business context. The Tableau walkthroughs are easy to follow and give you an appreciation for how much easier it is to make nice visualizations in Tableau than it is in Excel. You same data sets for the entire course, one data set for walkthoughs and one for homework assignments, which provides a nice sense of consistency. Weeks 1 and week 4 raise some useful considerations to keep in mind when preparing for and presenting a data analysis, but the Tableau sections in weeks 2 and 3 are the heart of the course. I would have preferred more content covering ins and outs of Tableau instead of the 2 weeks spent on communication topics, but the mix is probably about right for business-oriented students. I give Data Visualization and Communication with Tableau 4 out of 5 stars: very good.
Data Science and Machine Learning Essentials is a 5-week introductory data science course offered by Microsoft through edX that focuses on teaching students how to use Microsoft's cloud-based machine learning platform, Azure ML. The course divides content into two tracks, an R track and a Python track, so you can complete the course with either language, but you'll need to know the basics of at least one of the two. Grading is based on 5 weekly reviews and a single 20 question exam. The course title "Data Science and Machine Learning Essentials" is misleading because this course is not really about data science or machine learning per se. The first week attempts to cram an entire machine learning course or two worth of concepts into a handful of mediocre lectures, while the remainder of the course is all about Azure ML. Weeks 2-5 provide a nice overview of Azure ML and the fact that it has full lectures for both R and Python is a great feature that surely took a lot of extra time and effort to produce. The main lecturer's presentation skills aren't the best, but the videos are still easy to follow. Azure ML offers a lot of interesting functionality, like the ability to use Python and R scrips in the same project and publish projects as web services, but some of the exercises were tedious and ran slowly. If data "Data Science and Machine Learning Essentials" were renamed "Intro to Azure ML" and only included the content in weeks 2-5, it would be a good course. Weeks 2-5 are definitely worth checking out if you are interested in Azure ML. As it stands now, however, the first week bombards students with far too many concepts explained too quickly to foster real understanding and sets the wrong expectations for the remainder of the course. I give Data Science and Machine Learning Essentials 2.75 out of 5: mediocre.
Excel for Data Analysis and Visualization is an intermediate level course offered by Microsoft through the edX platform that covers cutting edge techniques for gathering, transforming and viewing data in Excel. The course focuses on getting students up to speed with new features and techniques offered in Excel 2016, such as the Excel data model, queries, DAX (a syntax of defining functions) and Power BI, an online productivity service that integrates with Excel. This course assumes you have some familiarity with MS Excel, particularly pivot tables and slicers. You can complete the course with Excel 2010 or 2013, but if you don't have Excel 2016 you'll have to download add ins and you'll have to work slightly harder to complete the assignments. Grading is based on 7 weekly labs and 12 comprehension quizzes. Weekly content in DAT206x consists of one to three short video lectures describing new Excel features followed by a comprehension quiz. The amount of video content per week is usually under 30 minutes, so you shouldn't need to commit more than an hour or two a week to complete the course. The lecture videos have adequate resolution to see cell values and lecturer's presentation is easy to follow. Weeks 1-7 have lab assignments that let you apply the techniques presented lecture. You only get a couple of submissions for most lab and quiz questions, but most questions are not too difficult. Excel for Data Analysis and Visualization is a succinct, informative course on new Excel features that is worth checking out for those interested in going beyond the basics. Using Excel 2016 for this course when it launched only a few months before the course debuted may partially be a ploy to convince Excel users to upgrade, but I can't fault Microsoft for teaching with the latest version of their own product, and I completed the course with Excel 2010 without much difficulty. I give Excel for Data Analysis and Visualization 4 out of 5 stars: very good.
Data Visualization is the fifth and final course in the data mining specialization offered by John Hopkins University on Coursera. The 4-week course provides a high-level overview of data visualization, covering topics like human visual perception, basic plotting constructs and design principles, visualizing networks and visualizing databases. The course doesn’t have any particular prerequisites, but knowing how to make plots with some software package or programming language will be helpful for the assignments. Grading is based on two quizzes and two peer-graded visualization projects. The lecture content in Data Visualization is better than the lectures of the previous courses in the data mining specialization. The instructor is easy to understand and there isn’t as much dense technical content to absorb. On the downside, since the course focuses on high-level concepts, you won’t learn how to actually construct your own visualizations. It’s up to you to pick out software and figure out how to make visualizations with it. It would have been preferable for the entire data science specialization to pick a programming language and stick with it throughout to pair concepts with specific implementations and exercises. Data Visualization is a nice introduction to visualization at a high level, but the lack of low-level technical instruction and exercises limits its practical usefulness, especially for students who don’t already know how to create their own visualizations. The course is relatively smooth end to what is otherwise a rocky specialization, but since the content has no real connection to the other courses in the data mining track, you could take it as a standalone course. I give Data Visualization 3 out of 5 stars: Fair.
Statistics for Business I is a spreadsheet-focused statistics course offered by the Indian Institute of Management, Bangalore through the edX platform. The course spans 5 weeks including 4 weekly lessons and one week for a final exam. Course topics include descriptive statistics, variable summaries, the shape of distributions and probability. The course has no prerequisites other than having access to Microsoft Excel. You may be able to get by with a free alternative like LibreOffice, but the course lectures use Excel. Grading is based on lecture comprehension questions, exercises, caselets and a final exam. Weekly content in Statistics for Business I consists of a series of relatively short lectures interspersed with comprehension questions, followed by several exercises and caselets to let students apply what they’ve learned. The lectures themselves are well-made strike a good balance between instructor face time and showing spreadsheet operations. The lead instructor, Shankar, is easy to understand and has some lighthearted yet instructive interactions with is brainy assistant Lysa (she’s a plastic brain that sits on his desk.). Each week has a ton of comprehension questions and exercises to let students get practice with the spreadsheet operations and concepts presented in lecture. Hands-on practice is essential for skill building, so having plenty of exercises is a good thing. Statistics for Business I starts out slow, but the pace picks up toward the final lessons. Some students might feel that the last couple of lessons cover too many concepts in one week. Although having plenty of exercises is generally a good thing, the large number of easy, repetitive exercises grew tiresome. The course might benefit from making some of the exercises optional so that students who need more practice can get it, while those who don’t can skip ahead. Statistics for Business I is a good course for learning how to deal with numbers in Excel, but the large of number of graded exercises can make things tedious at times. This course is best suited for beginners in statistics with basic knowledge of spreadsheets and those who know some statistics and want more experience using Excel. Statistics for Business II is set to launch in October 2015. I give Statistics for Business I 4 out of 5 stars: Very good.
Scalable Machine Learning is a 5-week distributed machine learning course offered by UC Berkeley through the edX platform. It is a follow up to another UC Berkely course: Introduction to Big Data with Apache Spark. Although the first course is not a strict perquisite, Salable Machine Learning uses the same virtual machine and even has some overlap with the homework labs, so it is beneficial to take Introduction to Big Data first. Scalable Machine Learning teaches distributed machine learning basics using Pyspark, Apache Spark’s Python API. Basic proficiency with Python is necessary to pass the course and some exposure to algorithms and machine learning concepts is helpful. Course evaluation is based primarily on 5 labs distributed as iPython notebooks. The first two weeks of the course cover machine learning basics and introduce Apache Spark. For students already familiar with machine learning basics who took Introduction to Big Data, there’s not much new to learn during first two weeks. Week 2 is essentially an exact clone of week 2 of the intro to big data course, including the lab assignment. The final 3 weeks have meatier lecture content and longer labs, each covering a different machine learning technique--linear regression, logistic regression and principal component analysis. The lecture content is clean and the lecturer speaks clearly. His delivery isn’t perfect, but the only real purpose of the lectures is to serve as background information for the meat of the course: the labs. Each lab is a lengthy iPython notebook with several sections leading you through the process of creating a pipeline for running a machine learning algorithm with Pyspark. Much of the code you need is provided for you, but writing the key functions and data transformations necessary to complete the labs can still be time consuming. Little things like an ambiguous instruction or uncaught error you made earlier in the assignment can result in bugs that take a while to squash. Despite occasional frustrations, the labs do a good job interspersing instruction with practical, hands-on learning. Scalable Machine Learning is a quality introduction to machine learning with Pyspark that focuses on labs over lectures. The lectures could be better and some of the instructions and error checks in the labs could be more comprehensive, but this is a great course for those looking to learn by doing. I give Scalable Machine Learning 4 out of 5 stars: Very Good.
CS100.1x Introduction to Big Data with Apache Spark is a 5-week intro to distributed computing offered by UC Berkeley through the edX MOOC platform focused on teaching students how to perform large-scale computation using Apache Spark. The assignments use PySpark, Spark’s Python API, so some familiarity with Python programming is necessary. You don’t need prior exposure to big data or distributed computing to take the course. Grades are based on four programming labs (80%), easy comprehension questions that allow unlimited attempts (12%) and setup of the course virtual machine used to complete the labs (8%). Course lectures in to Big Data with Apache Spark are relatively brief and tend to stay at a high level, discussing general big data concepts rather than the details of Apache Spark. The instructor does a fine job in the few lectures the course offers, but there were not enough of them and they often felt disconnected from the assignments. The fifth week had no lectures. The labs are the core of this course. While you can breeze through weekly lectures in half an hour or less, each of the four labs are lengthy reading and programming assignments packaged in IPython notebooks. Expect to spend 2 to 4 hours on labs 1, 2 and 4 and 3 to 6 hours on lab 3. The labs start by teaching basic Apache Spark manipulations and move on to some text analysis and machine learning. Using the IPython notebook to deliver labs is a convenient way to intermingle text and instructions with code. On the other hand, each exercise tends to depend on code executed somewhere above it, so a mistake made on earlier exercise can lead to some odd errors later on and Spark’s error traces aren’t particularly helpful. The course does provide some basic tests for each exercise, but it is easy to arrive at solutions that pass the checks but cause errors later on. The course forums on Piazza are a vital resource for troubleshooting and disambiguation; I imagine some of the snags will be resolved in future offerings. Despite the occasional hiccups, the labs do a good job familiarizing students with Apache Spark’s Resilient Distributed Dataset objects and the various transformations and actions you can perform with them. Introduction to Big Data with Apache Spark is a great place to start learning about distributed computing if you know some Python. Although the lectures don’t add much technical depth to the course, they provide some big picture background that will be useful for students who have little prior exposure to big data concepts. The labs give you adequate opportunity to get your hands dirty with Apache Spark to gain basic familiarity with data manipulations it offers. UC Berkley is offering a follow-up course “Scalable Machine Learning” that builds on the foundation laid in CS100.1x. I give this course 4 out of 5 stars: Very Good.
6.041x: Introduction to Probability - The Science of Uncertainty is a comprehensive 16-week introduction to probability offered by MIT through the edX MOOC platform. Although this course is dubbed an “introduction” it is not easy. You need familiarity with differential and integral calculus to understand some of the material, and the course can easily take 10-15 hours per week. Given its 16-week duration, the time commitment required to get through everything is much higher than the average MOOC. The course touches on all the major topics you need to gain a solid understanding of probability including basic axioms of probability, conditional probability and independence, discrete and continuous random variables, Bayesian inference and the probabilistic underpinnings of classical statistics. The course grade is based on lecture comprehension questions, weekly homework assignments, 2 midterms and 1 final exam. The midterms are worth 15% apiece and the final is worth 30% so good performance on the exams is paramount to getting a good score. You need a total of 60% to pass and it isn't quite as easy to achieve that mark as it is in most MOOCs. Weekly content consists of 2-4 lecture sequences covering different aspects of a particular topic in probability. Each lecture sequence contains about an hour of video in 5 to 15 minute segments and most video segments are followed by graded comprehension questions. The lecture videos themselves are crisp and the professor is good at explaining the material at a pace that doesn't overload you with too much information too quickly. There can be quite a bit of mathematical notation on the screen at times, but it is well-organized. Each week also has a series of solved problem videos where TAs walk you through applying the material in lecture to problems that are similar to those you will see in the homework. The solved problems sections add another 1 to 2 hours of video content per week. Pure math courses usually aren't that fun because they spend a lot of time dealing with proofs and theory and not so much time dealing with the real world. This course can be a slog at times because it is long and there is a lot to absorb and remember, but after building up the basic tools of probably in the first few weeks, later weeks focus on more interesting extensions and applications. You won’t find another intro to probability with greater depth and breadth. This course is best suited for technical and math-minded people who will have to work with and apply probability in future coursework or in their professional lives. If you're looking for an intro that just gets you up to speed on the rudiments of every-day probability like coin flipping and dice rolling this course is overkill. 6.041x: Introduction to Probability is a great course for those serious about forming a solid foundation in probability. As professor Tsitsiklis states early on, "the first step in fighting an enemy like randomness is to study and understand your enemy." At the end of this course you will be armed with the tools necessary to wage a well-reasoned war against uncertainty. I give 6.041x: Introduction to Probability 5 out of 5 stars: Excellent.
Applications of Linear Algebra Part 2 is the second part of an introductory linear algebra course offered by Davidson University through the edX MOOC platform. The course spans 6 units and runs for 6 weeks, but all the lecture content and activities are available as soon as the course opens. The topics presented in part 2 build on the foundation laid in part 1 and include: least squares, correlation, eigenvectors, singular value decomposition, Markov chains, principle components analysis and sports prediction. Applications of Linear Algebra Part 2 follows the same pattern as part 1: each week consists of 2 to 3 short lectures, each with a corresponding activity that illustrates an application of the topic covered in lecture. This formula worked well in part 1 because the topics were relatively simple and the activities were provided via basic web apps. In part 2, the concepts are more complicated--too complicated for students to develop a solid understanding of them after one short lecture video. In addition, most of the activities in part 2 require running code in MATLAB. The course provides a free MATLAB license and tutorial videos, but it takes more effort to jump into activities. On the plus side, once you get them up and running, the applications in part 2 are even more interesting and fun to play with than the activities in part 1. Professor Chartier is personable and engaging in the lectures despite following a prompter/script. Although his voice is clear, he spends a bit too much time reading off the numeric contents of matrices, when it would more instructive to have the matrices and other information on screen in persistent slides. Given the complexity of the material and brevity of the lecturers, students aren't likely to fully understand the math unless they have taken a course in linear algebra before. I suspect the lectures are going to leave a lot of students scratching their heads. It might have been wiser for the course not to purport to teach all the math behind the applications, but instead give a general overview of concepts before each activity and provide resources/references for students to learn about the math in greater detail. I don't normally advocate hand-waving, but as a course prioritizing applications over mathematical understanding, there are some instances where it may have been warranted. Overall, Applications of Linear Algebra Part 2 is another solid course that has a lot of interesting activities, but it is not as approachable as part 1 and tends to rush through complicated topics to get to interesting applications. I give Applications of Linear Algebra Part 2 a score of 4 out of 5 stars: Very Good.
CS188.1x: Artificial Intelligence is an introductory AI course offered by UC Berkeley through the edX MOOC platform. CS188.1x covers roughly the first half of the material in the full on-campus AI course in the span of 12 weeks. Major course topics include search algorithms and heuristics, constraint satisfaction problems, Markov decision processes and reinforcement learning. The course assumes you have taken a first course in algorithms, are familiar with basic data structures, have basic python programming skills and are comfortable with mathematical notation. There isn't any particularly hairy math, but there are a lot of variables and symbols flying around at times. Grading is based on weekly homework assignments that allow unlimited attempts, 3 programming projects and a final exam that allows 1 or 2 attempts per question. CS188.1x is a direct adaptation of the on-campus AI course. The lecture videos are edited versions of lectures delivered on-campus but instead of seeing the professor, we mostly see the presentation slides themselves with a voice-over from the professor. Direct adaptations of on-campus courses don't always work so well with MOOCs, but this course pulls it off perfectly. The professor speaks clearly and explains topics well. The lecture slides are extremely well-made, with clean text and even a bunch of cute robot and pacman art to go along with the content. The videos are cut down into digestible 5 to 15 minute segments and there are practice comprehension questions following most of the videos that allow you to take a second to reflect and digest the content. Many courses that have great presentation fall flat when it comes to assignments. This is not one of those courses. The three pacman-themed programming projects are among the best programming assignments I've encountered in any online course. Each project consists of several parts that involve implementing AI algorithms you study in class in the context of a pacman game. The course provides you with all the code you need to run the game, a variety of convenience functions and skeleton code that you have to fill in with algorithms that accomplish the prescribed tasks. The assignments can be frustrating at times, but seeing your code in action with a little pacman racing around gobbling food pellets and ghosts is surprisingly gratifying. It also helps you gain a better understanding of how the algorithms work. Berkeley CS188.1x: Artificial Intelligence is one of the best MOOCs on the web. It is so good that many students on the forums were eager to take part 2. Unfortunately the professors haven't gotten around to adapting the second half of the full AI course into a MOOC (they did express the desire to do so in the future) but they will give you access to an archived version of the full course upon request. I give Berkeley CS188.1x 5 out of 5 stars: Excellent.
Discrete optimization is a quasi-self-paced programming course offered by the University of Melbourne through Coursera that is all about solving hard problems. Hard problems in the context of this course means NP-hard problems--problems with exponential worst-case running times. The course differs from most classes on Coursera and elsewhere on the web in that all the materials are available as soon as the course opens, but there is a final deadline for the programming assignments, so it is not a self-paced course in the truest sense. The entire course grade is based on 5 programming assignments: the knapsack problem, graph coloring, traveling salesman, warehouse location and vehicle routing. An average score of 7 (out of 10) on each part of each programming assignment is required to earn a certificate. Discrete optimization opens with an introductory lecture series on the knapsack problem that lasts a couple of hours followed by three longer lecture series, covering constraint programming, local search and mixed integer programming. The lectures do not need to be viewed in any particular order. Similarly, students can work on the homework projects in any order they choose. This level of freedom is great for students who want to work ahead but it may make it difficult to complete the course if you don't plan ahead because the programming assignments can be very time consuming. The assignment skeleton and submission code is written in Python 2.7, but you can use languages if you want. The professor, Pascal Van Hentenryck, is extremely energetic and passionate about the subject. He makes the lecture videos surprisingly fun for such a dense subject. The lecture videos themselves are well-made and the professor does a good job explaining the material, although I sometimes felt like the course was trying to cover too many different topics and it wasn't always clear how one would go about applying the methods in lecture to the assignments or using them without using some external package or solver. A little more instruction and direction in that regard would be helpful. Discrete optimization is challenging course with great programming assignments that introduces many different tools and leaves them on the table for you to play with. The tools don't always with full instruction manuals, so you'll have to figure out many of the details yourself. You won't have time to apply every tool to every problem, but if you focus on one and budget your time well, you'll have a good shot at making it through. I give discrete optimization 4 out of 5 stars: Very Good.
Text Retrieval and Search Engines is the second course in Coursera's new data mining specialization offered by the University of Illinois at Urbana-Champaign. The course covers a variety of topics in text data mining and natural language processing including text retrieval, query ranking and evaluation methods, methods and the basics of recommender systems. Grading is based entirely on 4 weekly quizzes comprised of 10 multiple choice questions. You only get 1 attempt on the quizzes. The weekly content in Text Retrieval and Search Engines consists of around 10 video lectures that range from 5 to 20 minutes followed by a short 10 question quiz. If that sounds like a lot of lecture per question, it is, and there are no in-lecture quizzes to reinforce concepts as you go along. The lectures themselves are definitely a step up from the first course in the specialization, Pattern Discovery in Data Mining. The professor isn't hard to understand this time around and he explains concepts well enough to grasp them without having to re-watch videos. As with many of Coursera's other 4-week specializations, however, lectures sometimes turn into information dumps where the professor ends up reading off slides. The course does have a C++ programming assignment which was nice to see. Text Retrieval and Search Engines is a decent course that is worth a look if you are interested in text data mining and search engines. Although the lectures lackluster, they have some good information. If you're planning on getting a verified certificate, it is a good idea to try the practice quizzes before submitting the real one. I give this course 2.75 out of 5 stars: Fair.
Discrete Time Signals and Systems, Part 1: Time Domain is a 4-week introduction to discrete time signals offered by Rice University through the edX platform. This course was originally 8 weeks, but edX split it up into two parts, one covering the time domain and one addressing the frequency domain. Major course topics include signal properties, signals as vectors, linear time-invariant systems and convolution. The course requires some linear algebra and calculus (it has a pre-course assessment) as well as some basic programming in MATLAB. You don't need to know any MATLAB going in, but if you do you can skip the tutorial. Grading is based on a combination of comprehension questions, homework quizzes, peer graded free responses and a final exam. All of the course content other than assignments is available immediately so you can work ahead if you want to. Discrete Time Signals and Systems started around the same time as a similar signal processing course on Coursera called "Digital Signal Processing." I found Discrete Time Signals to be much more approachable than the Coursera course; it introduces concepts at a steady but manageable pace and doesn't overload you with math right out of the gate. The course isn't easy, but it isn't too difficult. The lecture videos are well-done and the instruction is very good, although some videos could stand to be broken up into multiple parts. Professor Baraniuk tends to stutter, but it didn't really bother me or detract from the quality of the instruction. The MATLAB programming questions are baked right into the edX website and let you get some hands-on experience with the concepts. The final exam is "closed book" which I think is a mistake as it promotes guessing over learning. All in all, Discrete Time Signals and Systems Part 1 is an excellent introduction to signal processing that is likely to be more accessible than other courses on the same subject you may find elsewhere. The stage is set for a deeper dive into signal processing in Part 2.
Applications of Linear Algebra Part 1 is a light, activity-focused introduction to linear algebra. This course is suitable for anyone who is curious about what linear algebra is and how it can be used in the real world, including high school students and advanced junior high students. The course doesn't go deep into the math, but rather focuses on thinking about data in terms of matrices and illustrating linear algebra operations with activities. The materials span 7 units that include activities ranging from image manipulation and animation to cryptography and sports prediction. Grading is very relaxed as you have unlimited attempts on comprehension quizzes and the remainder of the points are based on the activities. If you've taken a linear algebra course before, this class will be very easy, but you can still get some entertainment out of the activities and learn a bit about sports prediction. One of the biggest failings of math education is a heavy focus on rote repetition, which disconnects math from the real world and makes it boring. Applications of Linear Algebra is the type of course that is needed to raise interest in math. It introduces concepts at a digestible pace suitable for beginners and almost every lecture video that teaches a new concept is followed by an activity devoted to seeing that concept in action. Professor Chartier is clear and personable even though he seems to be working off a script--something that is not easy to do. The video quality is good and the activities, while simple, are illustrative. Applications of Linear Algebra Part 1 is great course to get beginners interested in linear algebra by getting their hands on fun activities as quickly as possible. I hope to see Professor Chartier carry the same formula into Applications of Linear Algebra Part 2 and build upon the foundation laid in part 1.
Pattern discovery in data mining is the first course in a new 5-part data mining specialization offered by the University of Illinois at Urbana- Champaign through Coursera. Keeping with the trend of other specialization courses, pattern discovery in data mining spans 4 weeks and will likely be offered again each month or two after the first offering. The course covers a range of methods for finding different types of patterns in data, such as association rules and patterns in graphs. Grading is based exclusively on 4 weekly quizzes. I was excited to see the new data mining specialization come up on Coursera to kick off 2015, but unfortunately, pattern discovery in data mining is a dull, poorly executed information dump. Besides an interesting topic, there’s not much going for this course. In the lectures, the professor reads information off dense slides and his delivery is more confusing than instructive. The slides, video and sound are of decent quality, but the explanations are not clear and while I normally don't have an issue with foreign accents, the professor's English made things harder to understand. To make things worse, there are few instructive in lecture quizzes and no activities or programming assignments. A course about data mining should have programming assignments or activities that let students interact with the concepts to reinforce learning. Pattern discovery in data mining is a disappointing start to the data mining specialization, that suffers from poor instruction quality and lack of illustrative assignments. Taking this course is like a data mining problem in and of itself: you have to spend a lot of time deciphering the lectures to uncover useful information.
Learning How to Learn is a 4 lesson self-paced course that summarizes key findings in neuroscience about how we learn. The course touches on brain function, working and long-term memory and various methods for improving learning as well as overcoming hurdles like procrastination. The lecture content in learning how to learn is very good. Videos aren't too long, the lecturer is clear and personable and everything is easy to understand. There are more bonus/guest lectures than you'd see with a typical MOOC and I find engaging, memorable guest lectures are rare. Also, you can't fully complete the course unless you verify your identity before submitting quizzes, even if you don't want a verified certificate. One of the main pitfalls with MOOCs is that you can get into the habit of watching hours of lecture content without taking time out to practice, recall and commit ideas into long-term memory. Good courses help students learn with quizzes and homework; this course teaches students other things they can do, such as making flash cards, taking breaks and getting adequate sleep, to maximize learning. Considering the main lecture content only takes a few hours complete, this course offers a good amount of value for your time.
Intro to relational databases is a short 4 lesson course that covers the basics of SQL databases. Lessons 1 and 2 cover basic SQL querying, including grouping, ordering and inner joins, lesson 3 addresses inserts and concerns when using a database backend for a webapp and lesson 4 covers database design principles and a few more advanced features like outer joins and subqueries. I won't get into the final project as Udacity's projects tend to be geared toward students with subscriptions. Each lesson consist of several short videos with quizzes that involve multiple choice questions and coding exercises that revolve around altering and submitting SQL queries. The instructor is easy to understand and explains things well. The content is polished and I didn't notice any bugs, which is rare for a brand new course. On the other hand, the course is a bit too short and doesn't give beginners enough practice with newly introduced syntax before moving on. It would be helpful to give students a few short drills writing queries related to each newly introduced keyword from scratch. Also, to follow along with lesson 3, you have to download, install and interact with a virtual machine. The time necessary to download, install and figure out how to use the VM is probably more than is warranted with such a short course, although the VM may be used for other Udacity courses. Intro to relational databases is a succinct overview of SQL basics that serves as a nice refresher for someone who has seen SQL before, but making it a little longer and providing more simple drills would probably be helpful for beginners.
Model Building and Validation is an advanced data science course provided by AT&T through Udacity. The course is listed as "advanced" because it assumes prior knowledge of machine learning, statistics, linear algebra and calculus. Despite the stated prerequisites, math doesn't play a large role, so you will still be able to understand most of the content even if your only preparation is Udacity's intro to machine learning. The course spans 4 lessons that detail the process of extracting value from data through questioning, modeling and validation. Lesson 1 is a general introduction to the QMV process with each of the following lessons digging into each component of QMV in more detail. The course somewhat oversells its length as none of the lessons take more than a few hours despite the course being listed at an estimated 8 weeks with 6 hours of study per week. Model Building and Validation follows the same formula as other Udacity courses, with each lesson taking the form of a series of short lecture videos interspersed with quizzes. The lecturers are easy to understand and the video quality is generally good, although the videos and course materials have some glitches that need to be ironed out. I won't grade the course too harshly on bugs, since all courses are buggy at the very beginning, and they will likely be fixed in the near future. As for the content itself, the simple idea of framing a data analysis as a tree to track and organize the decisions you make along the way is probably the most useful thing you'll take away from this course. The course also does a good job getting students to think about some of the high-level decisions that must be made when conducting a data analysis. The content gets rockier when it delves into specifics after lesson 1, particularly in the models lesson. The lectures occasionally dive too quickly into the low level details of machine learning techniques that students may not have seen before. Additionally the validation section focuses much more on model evaluation metrics like ROC curves, the confusion matrix and derived metrics that fall out of it, than validation itself. Model Building and Validation is a good course that provides a nice framework for approaching data analysis, but it gets bogged down in some machine learning specifics that don't add much to the overarching theme.
Social and economic networks is an introductory network theory and analysis course geared toward learners who have are comfortable with basic statistics, probability and linear algebra. You don't need to know anything about social networks ahead of time to take this course, but having basic familiarity with networks will help things go a bit smoother. The course has 7 weeks of lecture content covering network basics, measures of centrality, network formation models and diffusion, learning and games on networks. You'll also be introduced to Gephi, a software tool for network visualization and analysis. The 8th week is reserved for a final exam. Social and economic networks provides all the raw information you need to get a solid grounding in network theory and analysis, but the presentation style is impersonal so the content is not particularly engaging. The professor is knowledgeable and appears on screen while explaining lecture slides, but he shows little emotion. While the lectures can get a bit intimidating with equation after equation, the homework exercises and final exam are easier than the lectures might suggest. You get 2 attempts on each chapter quiz and 1 attempt on the final; a score of 70% or more is required for a certificate and 90% or more will earn you a certificate with distinction. All in all, social and economic networks is worthwhile course if you are interested in social networks and aren't intimidated by a bit of math, but I wouldn't take it for fun. If you want to take a course on the same subject that is less mathy consider Coursera's Networked Life from UPenn.
Networked life is a gentle introduction to network/graph theory that covers the basics of network structure, network formation models and networked games. The course consists of 7 weeks of lecture content--typically three 8-20 minute videos per week--with a 8-10 question quiz for each video. The quizzes aren't too difficult and you get 2 attempts, but since there is one quiz for every lecture video, you'll be spending a significant proportion of your total class time answering quiz questions. The course doesn't get into network algorithms or computing: it focuses on basic network structure, formation and games, so you can take this course without any programming or math background. Networked life debuted about 2 years ago, making it among the first courses available on Coursera, so the presentation and slide quality are a bit dated. The lecturer mainly reads directly off slides and you spend the majority of lecture time looking at static slides written in Comic Sans as the lecturer explains them in greater detail. The information is solid and generally interesting but the presentation is often a bit dull when there are no illustrations on the screen. The quizzes are probably the best part of the course; even though they are easy they help reinforce the content and break what might otherwise become a tedious slog through lecture video after lecture video. The course is self- paced, so despite it having "7 weeks" of content, you can finish it faster if you want to. Networked life is an accessible introduction to networks and while the presentation isn't great, the topics are interesting and the frequent quizzes help keep you engaged.
Machine Learning is one of the first programming MOOCs Coursera put online by Coursera founder Andrew Ng. Although Machine learning has run several times since its first offering and it doesn’t seem to have been changed or updated much since then, it holds up quite well. This course assumes that you have basic programming skills. Assignments also require many vector and matrix operations and slides include some long formulas expressed in summation notation so it is recommended to have some familiarity with linear algebra. You don't need to know calculus or statistics to take this course, but you may gain deeper insight into some of the material if you do. The course uses the Octave programming language, a free to use clone of MATLAB. The course runs 10 weeks and covers a variety of topics and algorithms in machine learning including gradient descent, linear and logistic regression, neural networks, support vector machines, clustering, anomaly detection, recommender systems and general advice for applying machine learning techniques. Lectures are split into 3 to 15 minute segments with periodic quizzes and each topic section has a corresponding quiz. Section quizzes are worth 1/3 of the total grade but you get unlimited attempts (with a 10-minute retry timer.). Andrew Ng does a good job explaining dense material and slides although the audio levels are often too low. If you don' have good speakers you might need headphones to hear him talk. The other 2/3 of the course grade is based on 8 multi-part programming assignments that typically involve filling in code for key functions to implement machine learning algorithms covered in lecture. The course gives you a lot of structure and direction for each homework, so it is generally pretty clear what you are supposed to do and how you are supposed to do it even if you don't understand 100% of the materiel covered in lecture. Machine learning is a great course if you can get past quiet audio. If you've never used Octave or MATLAB before, don't let that stop you from taking this course; learning the basics necessary to do the assignments only takes a couple of hours and it will help you think of things in terms of vectorized operations.
The hardware software interface covers computing from the level of the CPU to a low level programming language: C. Course content includes binary logic, C basics, C structs and arrays, x86 assembly, the stack and heap, caches, processes, virtual memory, memory allocation and differences between Java and C. The course consists of lecture videos with periodic in-lecture questions and several programming exercises. The presentation of material is good and the professors are easy to understand. On the other hand, the lectures didn't always cover everything you needed to know to tackle the homework; if you don't come into this course with any C experience, you'll probably need to do a bit of outside reading to tackle some of the homework. I also found myself getting a bit bored with this course due some long puzzle-like programming assignments and the low-level nature of the course. Overall, this is a quality MOOC focused on low level computing--a topic that is not covered in many online courses--but it takes a lot of time and attentiveness to complete all the content.
Machine Learning 2—Unsupervised Learning is the second part of a 3 part machine learning course offered by Georgia Tech through Udacity. It recommended that you take the first part before this course as the lecturers reference material from the first part from time to time. This course is much shorter than part 1, spanning only 4 lessons: methods for optimization, clustering, feature selection and feature transformation. There is also a supplementary section on information theory. The course format and quality mirrors part 1: the lecturers alternate taking on the role of teacher and student and introduce new material at a quick clip. I'm not sure if "teacher as student" works too well here, because the lecturers "catch on" almost instantly while real students are likely to need a little more time. The lecturers also have a tendency to dive into quizzes without adequate explanation of the problem. There is a single homework problem set at the end of the course that takes about 10 minutes and a final project about building a recommendation system. I would have liked to have seen short homework sets and programming exercises for each topic section, but the chemistry and wit of the lecturers help keep you engaged. If you enjoyed part 1, you’ll enjoy part 2.
Statistical Inference is the 6th course in the John Hopkins data science specialization track, which is basically an introduction to statistics in R. The course covers many different topics in the span of 4 weeks from basic probability and distributions to T tests, p values and statistical power. The lectures take the form of slideshows with a lot of dense mathematical notation, small text and mediocre voiceovers. The course tries to cover too much ground too fast and the material isn’t presented in a way that is easy to understand or engaging. I don’t think the lecturer’s face was shown once in the entire course. That’s not to say there isn’t good information in the lecture slides, but the presentation and execution are poor. If you’re looking for a good introduction to statistics that uses R, try Duke’s Data Analysis and Statistical Inference. Udacity’s “Statistics” is another solid option that is self-paced, moves a bit slower and does not require programming.
Effective thinking though mathematics is a course about increasing your ability to tackle new problems and understanding things you already know better. The course focuses on 4 main elements of effective thinking: understanding simple things deeply, making mistakes, raising questions and following the flow of ideas. Although this course has “mathematics” in the title, it is really about the process of thinking—math is just a convenient arena to teach these methods. You don’t need any particular math background to take this course and get a lot out of it, although be aware that most of the 9 weekly lessons deal, at least in part, with mathematical concepts like numbers, infinity, dimensionality and geometry. The course format is a little different from most MOOCs: each week consists of a series of videos where the professor gives problems to students and the students attempt to work through them. The professor helps the students reason though the problems by making suggestions and asking questions and he periodically addresses the viewer, explaining how the effective thinking methods were or should have been applied by the students. The nontraditional course format may be off-putting to some viewers, since the students spend quite a bit of time struggling with the problems, making little progress. I found it to be an interesting approach, although increasing the video speed is useful for times when things get too slow. This lighthearted course introduces some intriguing concepts and lays the foundation for approaching problems in a way that lets you gain new insights and deeper understanding. The main issue I had with the course that it did not provide enough challenging puzzles and homework problems for MOOC students to work on to apply the methods discussed in lecture. Everything felt a bit too easy. Despite that, this is a fun course that teaches methods that could be useful in almost any sphere of life and doesn't require a big time commitment.
Calculus One is a comprehensive introductory calculus course that covers everything you'd expect in a first year university calc class: limits, derivatives, integrals and applications for both. The instructors have a lot of passion for the subject and provide plenty of examples to help students learn the material. They also have a nice interactive quiz platform called MOOCulus that lets you go through practice problems online to your heart's content. This is a great course for anyone seeking to learn calculus for the first time or relearn later in life. The only downside to this course is that it is longer than most MOOCs--16 weeks--so it can be hard to keep up with the weekly schedule. If you take a lot of MOOCs, you may find that you get too busy with other newer ones to stick to the schedule.
Intro to statistics is one of Udacity's older courses and while it was one of the few free stats courses on the web when it was released, it has more a lot more competition today. Intro to stats is a decent course that covers some of the most basic topics in statistics. The course fairly slowly with periodic spurts of difficulty. While most MOOCs underuse interactive elements, I found that this course had too many in-lecture quizzes, which just end becoming tedious. If you're looking for a basic intro to stats, Udacity's other stats course "Statistics" is a better option.
Udacity's "Statistics" is provided by San Jose State University and offers a comprehensive introduction to statistics. This course should not be confused with Udacity's "Intro to Statistics" taught by the founder of Udacity, Sebastian Thurn. Topics covered in this course include research methods, visualizing data, measures of center and spread, z-tests, t-tests, ANOVA, chi- squared test, correlation and regression. This course has a ton of content that is well presented and covers each topic in great detail with a many quizzes and homework exercises after each lesson to reinforce learning. The pacing of this course is fairly slow, so it is perfect for someone who has never taken a statistics course before or someone who isn't super confident in their math skills. Just be aware that completing all the content will take a significant time commitment, likely 60-100 hours. I would recommend this course over Udacity's “Intro to Statistics.” If you want a course that moves faster and gives you the chance to do some computation, I recommend Data Analysis and Statistical Inference offered by Duke though Coursera.
Getting and cleaning data is the third course in the first wave of John Hopkins’s data science specialization track on Coursera. It is recommended that you take this course after taking the data scientist's toolkit and R programming courses. The title of the course pretty well sums up the content: the entire class is about loading data into R and cleaning it up so that it can be used of data analysis. You'll learn how to load various data formats into R, such as json, xml, csv, excel files and get data from other sources like MySQL and web APIs. The course also discusses subsetting data, adding variables, merging data, regular expressions and working with dates. This course is a good summary of many of the things that are useful to know when trying to access and prepare data for analysis. Similar to R programming, it suffers from overuse of static slides with voice-overs, a lack of instructor face time and a lack of interactive content or in-lecture quizzes to help you learn and retain as you go along. You'll be introduced to many R packages and syntax that you probably won't remember after a week or two, but you'll be exposed to many common data formats so that you can refer back to the course materials or other web resources to deal with them in the future.
A beginner’s guide to irrational behavior provides a nice overview of key topics in behavioral economics, including money, dishonesty, motivation, self- control and emotion. I only watched the video lectures for this course, so my review won’t touch on the readings, quizzes or assignments. The lecture content is engaging and raises many interesting ideas and questions about the way people think and act. Just be sure to maintain a healthy degree of skepticism and realize that the professor is only sharing his views based on his research and experience.
Exploratory data analysis is the third course released as a part of Udacity's new Data science focus area that launched at the beginning of 2014. The course provides an overview of using R to explore data and focuses heavily on the use of the ggplot2 package in R to create data visualizations. Although the course touches briefly on high-level theory and concepts like summary statistics, transforming data, correlation and linear regression, almost all of the quizzes and homework questions have to do with creating plots and making observations based on plots. This is not necessarily a bad thing--learning to plot in R is a valuable skill and an important part of exploratory data analysis--but it seems like the course should have spent a bit more time covering high-level concepts and numeric methods for exploring data like using tables and summaries. Despite that quibble, this is good course with a lot of high quality and practical content. It moves slowly enough for you to get comfortable with basic potting syntax before building up to more complex visualizations, but fast enough to keep you engaged. Be aware that the course mainly uses two data sets to teach the material: a data set of diamond prices and characteristics and set of pseudo Facebook data created by the instructors meant to mirror real Facebook data, such as friend counts, tenure on the site, user age and gender. Your enjoyment of the class will depend, in part, on your interest in the data.
The Data Scientist’s Toolbox is essentially just an overview of the data science specialization track offered by John Hopkins University through Coursera. The track consists of 9 courses that each last about 4 weeks which are released in batches of 3 courses each month. This course introduces the very basics of R and R studio, Git and Github and a few other things that will be used in the data science specialization. It is basically a bunch of introductory and supplementary material that shouldn't be a standalone course. You can complete all the lecture videos in the entire course in about 2 hours. It's almost embarrassing that John Hopkins has a paid verified certificate option for this course; what's worse, it is required to complete their data science specialization track. I suspect this will be a major turnoff for students interested in the track.
Intro to data science is an intermediate level course that assumes basic Python programming skills and knowledge of statistics. The course focuses on gathering, manipulating, analyzing and visualizing data using Python and various Python packages such as numpy, scipy and pandas. One of the best parts about this course was getting some exposure to some Python packages in the scipy stack, although I wish more time was devoted to explaining what the various modules in the scipy stack do, how to set them up at home and when to use them. The first lesson is a fairly gentle introduction with an interesting homework project dealing with data from the Titanic disaster. Lesson 2 goes into more detail about gathering and cleaning data using Pandas and an additional module that lets you make SQL-lite queries to extract data from Pandas data frames. Lesson 3 jumps into data analysis with a T test and linear regression using gradient descent. Going from basic data manipulation into these topics was a bit jarring in terms of difficulty and more time could have been spent explaining how the functions worked. I left without a great appreciation of what gradient descent is really doing. Lesson 4 is focused on making visualizations using a module that attempts to port the functionality R language’s ggplot2 plotting package. Finally, lesson 5 introduces the concept of big data and MapReduce as a solution to deal with large data sets. Each homework assignment after the first has students dealing with New York subway turnstile data, which allows you to get some level of familiarity with the data throughout the course. This was a very good decision, since it lets you focus on learning new concepts rather than spending time familiarizing yourself with new data sets over and over again. Intro to data science introduces some major topics in data science and does a pretty good job given the amount of content it offers, but coverage of the topics is too brief. Hopefully the forthcoming Udacity courses Exploratory Data Analysis and Data Wrangling with MongoDB will build on the foundation provided by this course and give students a bit more depth.
A fun, short introduction to design of objects. My main complaint is that the course is quite so short and yet they want you to do fairly involved final project. The size of final projects should be proportional to the amount of material and effort put into the class before the final project. A course with only 8-9 hours of material shouldn't have a 7+ hours on a final project.
I took this course through Yale's open courseware back in 2010 before most of today's big MOOC platforms existed. It is a 26 lecture philosophical discussion of death. A very interesting class for anyone interested in death. There's no work to complete other than watching the lectures, which takes around 20 hours.
A basic introduction to Python. Codeacademy has improved its materials a bit since they first launched; this is a decent course for learning basic syntax, functions and data structures. It's a good place to start to get a little bit of familiarity with Python before taking a full-length intro to CS course that uses Python.
A fun and informative course on the deign side of the web, including font styles and sizes, colors and page layout. Another great offering from Code School with highly polished materials and exercises.
Great interactive overview of GIT. Suburb quality level in materials and exercises. They set you up in a sandbox environment where you are actually interacting with GIT repositories. Highly recommended for anyone who wants to learn about how to use GIT.
A decent interactive introduction to Ruby. This seems like it must have been one of Code School's earlier creations because it doesn't have the same structure or the amount of polish put into it most of their other classes. There are no videos: it is an entirely text-based course, much like Codeacademy courses.
A nice, quick, interactive introduction to R. High quality instruction and examples. I went through this as some extra background for Coursera's Computing for Data Analysis class.
A nice quick intro to GIT. I'm not sure why other reviewers rated it so low. Sure it is very basic but it is a lot more fun than reading a static text page.
Udacity's Web Development course provides a high quality introduction to back- end web development with Python using Google App Engine. The course is taught by Steve Huffman, creator of the Reddit, which gives him many unique insights about web development and scaling websites. If I were to give this course a grade just based on the video lectures and quizzes themselves, it would be 5 out of 5, hands down. The video lectures are very well made and quizzes help reinforce the material without being too difficult. The class covers a wide range of topics including HTTP requests, basic HTML, getting user input, databases, user authentication, cookies, caching, scaling and APIs. The homeworks in this course all have to do with creating and deploying web applications using Google App Engine, primarily building and adding features to a blog. The homework, especially when you start building the blog, are a bit open-ended and probably more complex than the average student would be able to complete on their own. The lectures don't always provide all the things you need to know about Google App Engine to complete the assignments. Another annoying aspect of the homeworks is that Steve uses the Jinja2 templating engine in all his solutions, but he doesn't teach students how to use it. If you're willing to spend a lot of time doing outside reading (App Engine docs, Jinja2, etc.) , you might get through the homework on your own, but in the end I found it more effective to look at Steve's solutions and study how and why the worked.
This course is an overview of what software testing is and different testing methods. It focuses mainly on test coverage and random testing and the theory of testing in general. It doesn't provide much python-specific information outside of using assert statements to catch problems early. The material is a bit dry and it would have been nice if it covered python testing methods like unittest in detail in addition to the language-neutral testing techniques.
Algorithms: Design and Analysis, Part 2 picks up where part 1 left off. Several of the algorithms and discussions in Part 2 refer back to concepts discussed in the first part, so it is highly recommended to complete part 1 first. A few of the major topics covered include minimum spanning tree algorithms, the knapsack problem, dynamic programming, shortest path problems, the traveling salesman problem, P vs. NP and NP completeness and heuristics for hard problems. Part 2 is considerably harder than part 1 and the algorithms you write for homework need to be implemented well to get answers in a reasonable amount of time and without exceeding your system's memory. It is possible to complete the class using a high-level language (I used Python) but you'll probably have to spend a bit more time tweaking your code to get solutions in a reasonable amount of time. Like part 1, the instruction quality and assignments are top notch. My biggest gripe with the class is that the coverage of the P vs. NP question and NP completeness is brief, so students don’t gain a deep understand of what P vs. NP and NP completeness really mean. Introduction to theoretical computer science by Udacity provides a much more through overview of that particular topic. That said, Algorithms: Design and Analysis, Part 2 is another outstanding offering by Stanford and Coursera.
Algorithms Part 1 is an excellent introduction to the study of algorithm analysis and design. The course teaches some fundamental principles of algorithm analysis like big O notation and other important topics in algorithm design like data structures to represent graphs, the divide and conquer paradigm, heaps and hash tables. Algorithms discussed include quick sort, breadth first search, depth first search, finding strongly connected components of a graph and Dijkstra’s shortest path algorithm. The course requires the ability to program, but it is language neutral, meaning you can use whatever language you are most comfortable with to complete the assignments. The material is fairly dense and the quizzes and programming assignments are difficult if you haven’t taken a course on algorithms before. I’d highly recommend this course to anyone that wants to get serious about going beyond basic programming/scripting and learning some real computer science.
Introduction to theoretical computer science is all about identifying and tackling hard problems. The quality of the material and instruction is excellent. Sebastian Wernicke breaks down complex topics in a way that is easy to understand. Central topics include the P vs. NP question, NP completeness and strategies for dealing with NP-complete problems. The class uses a few related graph problems--vertex cover, independent set and clique-- to introduce and discuss the central topics. It also covers a few other interesting problems like traveling salesman and 3-SAT. As the name implies, this course is heavy on theory. As such, there is not a lot of actual programming you have to do to complete the course. There are a few programming problems, but quizzes and homework mostly revolve around multiple choice questions that get you to think about and master the concepts presented in lecture. Since it is light on programming, the course goes quickly if you don’t have to re-watch the lectures too many times to understand the material. Even though this class is about theory, you will learn practical things like preprocessing data to speed up algorithms. I highly recommend this course to anyone with curiosity about the P vs. NP question and solving hard problems.
Model thinking looks at the world under many different lenses which can lend insight into why the world and people work the way they do. This course can be likened to a college elective: it is fun, the workload isn't too high, the difficulty is relatively low and the material is interesting.