TECHNICAL GUIDE

Technical Requirements

FAQ

  • I get “AccessControlException: Permission denied: user=root, access=WRITE” when running Hadoop
    Create a folder in HDFS and set your user as the owner (explanation)
    $ su hdfs hadoop fs -mkdir /user/root
    $ su hdfs hadoop fs -chown root /user/root
  • Hadoop does not found my Mapper/Reducer classes (ClassNotFoundException)
    Set the path to the jar containing your main class
    $ export HADOOP_CLASSPATH=/path/to/jar/myjar.jar:$HADOOP_CLASSPATH
  • I get “ImportError” when importing the module numpy in Spark pyshell
    Install easy_install and pip. Then install numpy
    $ wget https://bitbucket.org/pypa/setuptools\/raw/bootstrap/ez_setup.py
    $ python ez_setup.py
    $ easy_install pip
    $ pip install numpy
  • sbt is not installed in Hortonworks
    Install it as follows
    $ curl https://bintray.com/sbt/rpm/rpm | \
      sudo tee /etc/yum.repos.d/bintray-sbt-rpm.repo
    $ sudo yum install sbt
    $ sbt -v sbtVersion           # Takes a lot of time
  • How to compile & execute Spark Streaming programs (written in scala)
    Execute the following commands inside the project template
    $ sbt assembly     # Compiles and builds the jar for spark
    $ spark-submit --class Tutorial target/scala-2.10/Tutorial.jar
  • How to reduce spark’ level of verbosity
    Replace the content of /etc/spark/2.3.2.0-2950/0/log4j.properties with:
    
    # Set everything to be logged to the console
    log4j.rootCategory=WARN, console
    log4j.appender.console=org.apache.log4j.ConsoleAppender
    log4j.appender.console.target=System.err
    log4j.appender.console.layout=org.apache.log4j.PatternLayout
    log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
    
    # Settings to quiet third party logs that are too verbose
    log4j.logger.org.eclipse.jetty=WARN
    log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
    log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO
  • I get “ERROR 401: Authentication credentials” when executing the twitter streaming example
    1. Ensure that you have set valid consumer key/secret, access token/secret
    2. Synchronize your system clock (Hortonworks) 
    $ yum install ntp
    $ sudo ntpdate ntp.ubuntu.com