Technical Requirements


  • I get “AccessControlException: Permission denied: user=root, access=WRITE” when running Hadoop
    Create a folder in HDFS and set your user as the owner (explanation)
    $ su hdfs hadoop fs -mkdir /user/root
    $ su hdfs hadoop fs -chown root /user/root
  • Hadoop does not found my Mapper/Reducer classes (ClassNotFoundException)
    Set the path to the jar containing your main class
    $ export HADOOP_CLASSPATH=/path/to/jar/myjar.jar:$HADOOP_CLASSPATH
  • I get “ImportError” when importing the module numpy in Spark pyshell
    Install easy_install and pip. Then install numpy
    $ wget\/raw/bootstrap/
    $ python
    $ easy_install pip
    $ pip install numpy
  • sbt is not installed in Hortonworks
    Install it as follows
    $ curl | \
      sudo tee /etc/yum.repos.d/bintray-sbt-rpm.repo
    $ sudo yum install sbt
    $ sbt -v sbtVersion           # Takes a lot of time
  • How to compile & execute Spark Streaming programs (written in scala)
    Execute the following commands inside the project template
    $ sbt assembly     # Compiles and builds the jar for spark
    $ spark-submit --class Tutorial target/scala-2.10/Tutorial.jar
  • How to reduce spark’ level of verbosity
    Replace the content of /etc/spark/ with:
    # Set everything to be logged to the console
    log4j.rootCategory=WARN, console
    log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
    # Settings to quiet third party logs that are too verbose$exprTyper=INFO$SparkILoopInterpreter=INFO
  • I get “ERROR 401: Authentication credentials” when executing the twitter streaming example
    1. Ensure that you have set valid consumer key/secret, access token/secret
    2. Synchronize your system clock (Hortonworks) 
    $ yum install ntp
    $ sudo ntpdate