Tag Archive: apache

oozie workflow

How to schedule Cloudera Impala data pipelines in Apache Oozie?

Oozie is a software built on Hadoop with which we are able to create workflows and schedule them. We can build data pipelines, the components of the pipelines can be Java code, Sqoop, Pig, Hive or Shell script and so on. Inside the workflow jobs can be defined to run either in parallel or in sequence. There is a graphical interface made for Oozie inside HUE. Here we can conveniently define our jobs, manage and monitor them. Components…
Read more

Apache Zeppelin with Spark on Docker on Microsoft Azure

Generating normal distribution with Apache Zeppelin running on a Docker container on Microsoft Azure cloud platform

Creating this project was mainly motivated by trying the capabilities of Apache Zeppelin, which seems to have a lot of potential in the hands of a data scientist. The project is built around a dice game. The user can determine how many dice we throw and how many times we throw those dice. Exciting so far, right? What is more exciting is that at a sufficient amount of throws we expect a Gaussian distribution curve to be emerged. Let's try if it will do or not.  Apache Zeppelin  Apache Zeppelin is basically a notebook, which focuses mainly on Apache Spark engine. A…
Read more