Using ZeppelinHub to share Zeppelin notebooks

zeppelinhub
If you are already a user of Apache Zeppelin, you know what are the benefits of using these kind of notebook for implementing data science applications. In this post of mine, you can read about Apache Zeppelin and also some words about ZeppelinHub. That time I had not got the opportunity to try the features of it. But recently I have received my beta test account and finally got the opportunity to check on the really beneficial features of it. Let's check them.

Integration

I have a Zeppelin instance running inside of a Docker container. The GUI can be reached on port 8080 from the host and the container can reach the internet.
I am using dylanmei/zeppelin, to pull it, type in:

Integrating my instance with my ZeppelinHub account was quite easy. There is good documentation about it on the ZeppelinHub site.

  1. At first, you have to download an integration library. This is a jar file and it has to be copied into "ZEPPELIN_HOME/lib/".

    • If you are also using Docker for running applications, then you can use
    • You can also use the shell inside your container with
  2. You can now obtain a token for your Zeppelin instance. You can manage more Zeppelin instance with your one ZeppelinHub account, it is quite practical!

    Kép

    • Create a new instance, and give it a name. You can also include your Zeppelin credentials (if you have any) inside representation of the instance.
    • You get a special token for the Zeppelin instance. Each instance will have a unique one. You also get a user key with your token, if you have provided user credentials, too.
  3. Now you have to setup your Zeppelin client on your machine to be synchronized. It can be done with the previously received token.

    • In order to do this, you only have to add some new environment variables for Zeppelin.
    • Create a new file from zeppelin-env.sh.template, if it does not exist.
    • Edit this file, and add the variables and values below:
    • If you also use the same Docker container, you won't have some fancy text editors inside. Because of this I am used to copy the file from the Docker container with docker cp and edit them on my host machine.
    • Now you have to restart Zeppelin.
  4. And now all your notebooks has to be synchronized with your ZeppelinHub account. Now you are able to open them online!

    Kép

    Kép

Added value of ZeppelinHub

What basically ZeppelinHub does, it eases the way you share your data mining projects or data visualizations. You don't have to host your own server to publish them for others, but instead you are able to keep your notebooks synchronized with the online representations in real-time. And of course, your instance does not have to be online all the time in order to view your notebooks from the Hub, because when it syncs your notebooks, it creates an online copy.
You are able to create different groups with different members. These groups are called spaces in ZeppelinHub terminology. This will allow you to choose with whom you would like to share your notebooks. It is really flexible, because in one space you can store multiple notebooks from different Zeppelin instances.

Still in beta

But now it is still in beta, so we are eager to see the new features and maybe true collaborating possibility in the future. Because at the moment "collaborating" only means that you and all the space members can create comments under your workbooks. However this feature didn't work well for me, it could not save my comments yet.
The other big feeling of lack for me that it is not possible to edit the notebooks yet. It involves that if you have created blocks for input values, they cannot be fed with the new values, because you are not able to rerun the code block. So you are not able to modify anything, but I think even this beta version is still a good way of presentation!
So I am really looking forward to being able to use the 1.0 version!