Data Science, Apache Spark & Python: Analyze Real Data!
If you want to support me: When you purchase the course through my link here, I receive a higher sales commission from Udemy than if the course is accessed and purchased through the platform directly.
Udemy assures me that you will receive the current best price at which this course is currently sold on the Udemy marketplace. So you should not pay more through my link than if you buy the course directly through the platform.
The goal of data science is to generate knowledge from structured and unstructured data. With data analysis and visualizations, new decision-making foundations are created, for example.
However, for particularly large amounts of data, called Big Data, conventional spreadsheet programs like Excel are no longer suitable and we need more specialized tools instead: Apache Spark is a framework for cluster computing, with which calculations are distributed across multiple computers. This makes it possible to handle particularly extensive amounts of data.
This course for future data scientists and other data enthusiasts teaches Apache Spark in the Python programming language. Participants should already have some programming experience. After this course, they can independently evaluate and visualize statistics. More specifically, they master topics such as:
- Resilient Distributed Dataset
- DataFrames
- Spark SQL
- MapReduce
- Spark in the cluster on the Amazon AWS Cloud (Elastic Map Reduce)
- Creating graphics with Matplotlib
This course offers not only exercises but also projects in which real datasets are analyzed. Glacier statistics are analyzed, taxi data is visualized, the frequency of words in an e-book is determined, and birth statistics in the USA are evaluated.