Setup Mini Data Lake and Platform on M1 Mac— Part 2

Jun Li
SelectFrom
Published in
5 min readJan 4, 2022

--

Photo by Glenn Carstens-Peters on Unsplash

From this part, I will walk you through the setup and installation of tools and services required to build up the mini data lake on my M1 MacBook Pro.

This part is about installing the basic environment for a later setup.

Install Basic Environment

The terminal is the tool you need to interact with most of your time during the setup. Instead of using macOS built-in terminal, iTerm2 is a more powerful alternative and can make you more productive. Then you can use zsh as the default shell. I would recommend you install oh-my-zsh on top of it, it will automatically prompt you to switch to zsh during installation. The good part for oh-my-zsh is it supports plenty of plugins which makes your life much easier and more productive. If you just want to make a minimal number of plugins, I would recommend you at least install zsh-autosuggestions and zsh-syntax-highlighting plugins, for each of them, browser to ‘Oh My Zsh’ option.

I’m not a fan of using the powerlevel10k theme, I would prefer keeping it simple. You can specify the theme and plugins in the ~/.zshrc file.

example for zsh-autosuggestion
ZSH_THEME="simple"plugins=(git zsh-autosuggestions zsh-syntax-highlighting)

Install JDK8 and set JAVA_HOME

Java 8 is still safe to be compatible with most of the data-related open source services and applications such as Hadoop, Hive etc. I installed Zulu Builds of OpenJDK for my laptop as shown below.

Zulu JDK8 (LTS) for macOS ARM 64-bit

Once you installed it, set the JAVA_HOME environment variable in your ~/.zshrc file, then source it to make it effective.

export JAVA_HOME=/Library/Java/JavaVirtualMachines/zulu-8.jdk/Contents/Home$ source ~/.zshrc

When you type ‘java -version’ in the terminal, you should see a response like below.

$ java -version
openjdk version "1.8.0_312"
OpenJDK Runtime Environment (Zulu 8.58.0.13-CA-macos-aarch64) (build 1.8.0_312-b07)
OpenJDK 64-Bit Server VM (Zulu 8.58.0.13-CA-macos-aarch64) (build 25.312-b07, mixed mode)

Install Apache Maven

Apache Maven is a software project management and comprehension tool, particularly for java. You can download the binary tar.gz archive from here. Once you unzip the binary file, add the maven commands to your ‘PATH’ in ~/.zshrc and then source it.

export PATH=$PATH:<maven home path>/bin

Install Python

It’s a bit tricky for python in MacOS with Apple Silicon. We normally install anaconda python which includes most of the commonly used Python libraries for data science. However, it’s not natively supported on ARM-based MacOS even you can still install x86–64 distribution of it on top of ‘Rosetta2’. Fortunately, it provides the experimental Conda-forge macOS ARM64 distribution. On my laptop, I installed both standard anaconda python for MacOS and mini forge for macOS arm64.

You can install a command line installer from Anaconda website which you can customize the install path and initialize it with zsh.

$ zsh <path to anaconda install script> 

then you just follow the step-by-step instructions. In the last step, select ‘Y’ to initialize the Conda environment within your ~/.zshrc.

To install mini forge of Conda-forge, you need to install brew first. Checkout out this article for the full installation. You also need to install xcode command-line tools.

$ xcode-select --install

then you can install miniforge by simply running the command below:

$ brew install miniforge

miniforge is installed at ‘/opt/homebrew/Caskroom/miniforge’ by default, to initialize miniforge version of Conda, run

$ /opt/homebrew/Caskroom/miniforge/base/bin/conda init zsh

then the default Conda is set to miniforge one. To switch to anaconda one, run

$ <anaconda_home>/bin/conda init zsh

You can also use below command to check the available conda environments installed.

$ conda env list
# the conda command here depends on which version you refer to, miniforge or anaconda

One of the advantages of miniforge is you can install TensorFlow for M1 mac easily to do the machine learning which can take advantage of powerful m1 GPU. I will cover this part in Part 5.

For general purpose of python application or data science, you can still use anaconda standard one if the performance is not critical as it already includes most of required packages that you don’t need to install.

Install docker desktop

It’s quite straightforward to install docker desktop for Mac. Download it from here and you can also notice the known issues in this page. Once you finish the installation, you need to enable docker-compose v2 and configure the allocation of your resources. In my case, I allocated 4 CPUs and 16GB of memory. You should allocate resources based on your max physical resources, but I would recommend at least allocate 6GB memory and 2 CPUs to keep overall performance. You need to enable ‘Kubernetes’ which will be used in the later installation.

The world’s fastest cloud data warehouse:

When designing analytics experiences which are consumed by customers in production, even the smallest delays in query response times become critical. Learn how to achieve sub-second performance over TBs of data with Firebolt.

--

--

An engineer enthusiastic about software/web/mobile development, cloud native, data and AI.