Tag Archive: rdd

Apache Zeppelin with Spark on Docker on Microsoft Azure

Generating normal distribution with Apache Zeppelin running on a Docker container on Microsoft Azure cloud platform

Creating this project was mainly motivated by trying the capabilities of Apache Zeppelin, which seems to have a lot of potential in the hands of a data scientist. The project is built around a dice game. The user can determine how many dice we throw and how many times we throw those dice. Exciting so far, right? What is more exciting is that at a sufficient amount of throws we expect a Gaussian distribution curve to be emerged. Let's try if it will do or not.  Apache Zeppelin  Apache Zeppelin is basically a notebook, which focuses mainly on Apache Spark engine. A…
Read more