Configurations

Azkaban can be configured in many ways. The following describes the knobs and switches that can be set. For the most part, there is no need to deviate from the default values.

Azkaban Web Server Configurations

These are properties to configure the web server. They should be set in azkaban.properties.

General Properties

Parameter Description Default
azkaban.name The name of the azkaban instance that will show up in the UI. Useful if you run more than one Azkaban instance. Local
azkaban.label A label to describe the Azkaban instance. My Local Azkaban
azkaban.color Hex value that allows you to set a style color for the Azkaban UI. #FF3601
azkaban.depth Graph expansion level. Zero (0) means all flows are collapsed when the graph is rendered for the first time. 2 This means the flows will be recursively expanded up to two levels down when the graph is 1st shown.
web.resource.dir Sets the directory for the ui’s css and javascript files. web/
default.timezone The timezone that will be displayed by Azkaban. America/Los_Angeles
viewer.plugin.dir Directory where viewer plugins are installed. plugins/viewer
job.max.Xms The maximum initial amount of memory each job can request. This validation is performed at project upload time 1GB
job.max.Xmx The maximum amount of memory each job can request. This validation is performed at project upload time 2GB

Multiple Executor Mode Parameters

Parameter Description Default
azkaban.use.multiple.executors Should azkaban run in multi-executor mode. Required for multiple executor mode. false
azkaban.executorselector.filters A common separated list of hard filters to be used while dispatching. To be chosen from StaticRemaining, FlowSize, MinimumFreeMemory and CpuStatus. Order of filter does not matter.  
azkaban.executorselector.comparator.{ComparatorName} Integer weight to be used to rank available executors for a given flow. Currently, {ComparatorName} can be NumberOfAssignedFlowC omparator, Memory, LastDispatched and CpuUsage as ComparatorName. For example:- azkaban.executorselec tor.comparator.Memory =2  
azkaban.queueprocessing.enabled Should queue processor be enabled from webserver initialization true
azkaban.webserver.queue.size Maximum flows that can be queued at webserver 100000
azkaban.activeexecutor.refresh.milisecinterval Maximum time in milliseconds that can be processed without executor statistics refresh 50000
azkaban.activeexecutor.refresh.flowinterval Maximum number of queued flows that can be processed without executor statistics refresh 5
azkaban.executorinfo.refresh.maxThreads Maximum number of threads to refresh executor statistics 5

Jetty Parameters

Parameter Description Default
jetty.maxThreads Max request threads 25
jetty.ssl.port The ssl port 8443
jetty.keystore The keystore file  
jetty.password The jetty password  
jetty.keypassword The keypassword  
jetty.truststore The trust store  
jetty.trustpassword The trust password  

Project Manager Settings

Parameter Description Default
project.temp.dir The temporary directory used when uploading projects temp
project.version.retention The number of unused project versions retained before cleaning 3
creator.default.proxy Auto add the creator of the projects as a proxy user to the project. true
lockdown.create.projects Prevents anyone except those with Admin roles to create new projects. false
lockdown.upload.projects Prevents anyone but admin users and users with permissions to upload projects. false

MySQL Connection Parameter

Parameter Description Default
database.type The database type. Currently, the only database supported is mysql. mysql
mysql.port The port to the mysql db 3306
mysql.host The mysql host localhost
mysql.database The mysql database  
mysql.user The mysql user  
mysql.password The mysql password  
mysql.numconnections The number of connections that Azkaban web client can open to the database 100

Executor Manager Properties

Parameter Description Default
execution.logs.retention.ms Time in milliseconds that execution logs are retained 7257600000L (12 weeks)

Notification Email Properties

Parameter Description Default
mail.sender The email address that azkaban uses to send emails.  
mail.host The email server host machine. localhost
mail.port The email server port. 25
mail.user The email server user name.  
mail.password The email password user name.  
mail.tls Use TLS for the connection. false
mail.useAuth Use authentication. true

User Manager Properties

Parameter Description Default
user.manager.class The user manager that is used to authenticate a user. The default is an XML user manager, but it can be overwritten to support other authentication methods, such as JDNI. azkaban.user.XmlUserM anager
user.manager.xml.file Xml file for the XmlUserManager conf/azkaban-users.xm l

User Session Properties

Parameter Description Default
session.time.to.live The session time to live in ms seconds 86400000
max.num.sessions The maximum number of sessions before people are evicted. 10000

Azkaban Executor Server Configuration

Executor Server Properties

Parameter Description Default
executor.port The port for azkaban executor server 0 (any free port)
executor.global.properties A path to the properties that will be the parent for all jobs. none
azkaban.execution.dir The folder for executing working directories executions
azkaban.project.dir The folder for storing temporary copies of project files used for executions projects
executor.flow.threads The number of simultaneous flows that can be run. These threads are mostly idle. 30
job.log.chunk.size For rolling job logs. The chuck size for each roll over 5MB
job.log.backup.index The number of log chunks. The max size of each log is then the index * chunksize 4
flow.num.job.threads The number of concurrent running jobs in each flow. These threads are mostly idle. 10
job.max.Xms The maximum initial amount of memory each job can request. If a job requests more than this, then Azkaban server will not launch this job 1GB
job.max.Xmx The maximum amount of memory each job can request. If a job requests more than this, then Azkaban server will not launch this job 2GB
azkaban.server.flow.max.running.minutes The maximum time in minutes a flow will be living inside azkaban after being executed. If a flow runs longer than this, it will be killed. If smaller or equal to 0, there’s no restriction on running time. -1

MySQL Connection Parameter

Parameter Description Default
database.type The database type. Currently, the only database supported is mysql. mysql
mysql.port The port to the mysql db 3306
mysql.host The mysql host localhost
mysql.database The mysql database  
mysql.user The mysql user  
mysql.password The mysql password  
mysql.numconnections The number of connections that Azkaban web client can open to the database 100

Plugin Configurations

Execute-As-User

With a new security enhancement in Azkaban 3.0, Azkaban jobs can now run as the submit user or the user.to.proxy of the flow by default. This ensures that Azkaban takes advantage of the Linux permission security mechanism, and operationally this simplifies resource monitoring and visibility. Set up this behavior by doing the following:-

Execute.as.user is set to true by default. In case needed, it can also be configured to false in azkaban-plugin’s commonprivate.properties Configure azkaban.native.lib= to the place where you are going to put the compiled execute-as-user.c file (see below) Generate an executable on the Azkaban box for azkaban-common/src/main/c/execute-as-user.c. it should be named execute-as-user Below is a sample approach

  • scp ./azkaban-common/src/main/c/execute-as-user.c onto the Azkaban box
  • run: gcc execute-as-user.c -o execute-as-user
  • run: chown root execute-as-user (you might need root privilege)
  • run: chmod 6050 execute-as-user (you might need root privilege)