Author Archive: csengerszabo

business intelligence dashboard

Can notebooks be successors of traditional BI tools?

The term, Business Intelligence was firstly mentioned in the 19th century, especially in 1865. Richard Millar Devens used the term in his book, Commercial and Business Anecdotes. With this expression he described how a banker gained more profit by being…
Read more

oozie workflow

How to schedule Cloudera Impala data pipelines in Apache Oozie?

Oozie is a software built on Hadoop with which we are able to create workflows and schedule them. We can build data pipelines, the components of the pipelines can be Java code, Sqoop, Pig, Hive or Shell script and so on. Inside the workflow jobs can be defined to run either in parallel or in sequence. There is a graphical interface made for Oozie inside HUE. Here we can conveniently define our jobs, manage and monitor them. Components…
Read more

Apache Zeppelin with Spark on Docker on Microsoft Azure

Generating normal distribution with Apache Zeppelin running on a Docker container on Microsoft Azure cloud platform

Creating this project was mainly motivated by trying the capabilities of Apache Zeppelin, which seems to have a lot of potential in the hands of a data scientist. The project is built around a dice game. The user can determine how many dice we throw and how many times we throw those dice. Exciting so far, right? What is more exciting is that at a sufficient amount of throws we expect a Gaussian distribution curve to be emerged. Let's try if it will do or not.  Apache Zeppelin  Apache Zeppelin is basically a notebook, which focuses mainly on Apache Spark engine. A…
Read more

Data-driven organization

The keys of being data-driven

Being data-driven has become a must for companies that want to preserve their competitiveness. However achieving that requires some important capabilities. Without implementing them, one company cannot claim itself to be data-driven. In some cases they still do it, despite they do not possess all keys of being…
Read more

Industry 4.0

The role of Big Data in creating Industry 4.0

In the last 20 years the stakeholders of industry were able to reduce the amount of waste and enhance the quality of the products and the yield. This could happen due to they implemented lean and six sigma methods within…
Read more