Big Data Analysis with Scala and Spark
Provided by:

Provided by:

Course Details
Cost
FREE,
Add a Verified Certificate for $79
Upcoming Schedule
- Upcoming
Course Provider

Coursera online courses
Coursera's online classes are designed to help students achieve mastery over
course material. Some of the best professors in the world - like neurobiology
professor and author Peggy Mason from the University of Chicago, and computer
science professor and Folding@Home director Vijay Pande - will supplement your
knowledge through video lectures. They will also provide challenging
assessments, interactive exercises during each lesson, and the opportunity to
use a mobile app to keep up with yo...
Coursera's online classes are designed to help students achieve mastery over
course material. Some of the best professors in the world - like neurobiology
professor and author Peggy Mason from the University of Chicago, and computer
science professor and Folding@Home director Vijay Pande - will supplement your
knowledge through video lectures. They will also provide challenging
assessments, interactive exercises during each lesson, and the opportunity to
use a mobile app to keep up with your coursework. Coursera also partners with
the US State Department to create “learning hubs” around the world. Students
can get internet access, take courses, and participate in weekly in-person
study groups to make learning even more collaborative. Begin your journey into
the mysteries of the human brain by taking courses in neuroscience. Learn how
to navigate the data infrastructures that multinational corporations use when
you discover the world of data analysis. Follow one of Coursera’s “Skill
Tracks”. Or try any one of its more than 560 available courses to help you
achieve your academic and professional goals.
Provider Subject Specialization
Humanities
Sciences & Technology
Course Description
Manipulating big data distributed over a cluster using functional concepts is rampant in industry, and is arguably one of the first widespread industrial uses of functional ideas. This is evidenced by the popularity of MapReduce and Hadoop, and most recently Apache Spark, a fast, in-memory distributed collections framework written in Scala. In this course, we'll see how the data parallel paradigm can be extended to the distributed case, using Spark throughout. We'll cover Spark's programming model in detail, being careful to understand how and when it differs from familiar programming models, like shared-memory parallel collections or sequential Scala collections. Through hands-on examples in Spark and Scala, we'll learn when important issues related to distribution like latency and network communication should be considered and how they can be addressed effectively for improved performance.
Learning Outcomes. By the end of this cou...
Manipulating big data distributed over a cluster using functional concepts is rampant in industry, and is arguably one of the first widespread industrial uses of functional ideas. This is evidenced by the popularity of MapReduce and Hadoop, and most recently Apache Spark, a fast, in-memory distributed collections framework written in Scala. In this course, we'll see how the data parallel paradigm can be extended to the distributed case, using Spark throughout. We'll cover Spark's programming model in detail, being careful to understand how and when it differs from familiar programming models, like shared-memory parallel collections or sequential Scala collections. Through hands-on examples in Spark and Scala, we'll learn when important issues related to distribution like latency and network communication should be considered and how they can be addressed effectively for improved performance.
Learning Outcomes. By the end of this course you will be able to:
- read data from persistent storage and load it into Apache Spark,
- manipulate data with Spark and Scala,
- express algorithms for data analysis in a functional style,
- recognize how to avoid shuffles and recomputation in Spark,
Recommended background: You should have at least one year programming experience. Proficiency with Java or C# is ideal, but experience with other languages such as C/C++, Python, Javascript or Ruby is also sufficient. You should have some familiarity using the command line. This course is intended to be taken after Parallel Programming: https://www.coursera.org/learn/parprog1.
