Spark Streaming Rate Limiting and Back Pressure

With the release of Spark 1.6 came the feature to allow your streaming applications to apply back pressure, which is a form of rate limiting that ensures that your streaming application can handle spikes in events without flooding your clusters...

Read More

Running Scala scripts in Spark shell

One of the great features of Apache Spark is the Spark Shell. It allows for interactive analysis of data from a variaty of data sources, be that HDFS, S3 or just your local file system. The spark shell allows you...

Read More

Local Spark Setup

In this post I will explain the steps that I took in order to setup a development environment on my macbook, in order to run Spark jobs using HDFS and YARN.

Install Hadoop & Spark

brew...
      
Read More