Training Scripts

Generate Tokenized Dataset

This Script generates a Tokenized Dataset from a HuggingFace-Dataset and stores it on the disk.

Parameters:

--save_path SAVE_PATH: Path to save-directory for the generated Dataset
--min_tokens MIN_TOKENS: Number of minimum Tokens one Context has to contain in order to be used in the generated Dataset
--num_processes NUM_PROCESSES: Number of concurrent Processes for downloading Contexts
--tokenizer_name TOKENIZER_NAME: HuggingFace-Name for the Tokenizer for tokenizing the Contexts. Currently only works for CodeLlama-Tokenizers
--dataset_name DATASET_NAME: HuggingFace-Name for Dataset to load data from
num_files: Number of Files to generate. If Multiprocessing is used: Has to be multiple of num_processes
num_samples_per_file: Number of tokenized Contexts per File

Example:

python3 generate_tokenized_dataset.py --save_path ~/llama_tokenized_dataset --min_tokens 256 --num_processes 16 16 10000

Generates a Tokenized Dataset of Samples, containing at least 256 Tokens with 16 Processes. Saves this Dataset into 16 Files, each containing 10000 Tokenized Contexts.

Train Autoencoder

This Script trains a Sparse Autoencoder Model.

Parameters:

--num_batches NUM_BATCHES: Amount of Batches (AutoEncoder-sized) used to train the AutoEncoder on
--dataset_path DATASET_PATH: Path to the Pretokenized Dataset
--save_path SAVE_PATH: Path to save the trained AutoEncoders to
--model_name MODEL_NAME: HuggingFace-Name for the Model for obtaining Activations. Currently only works for CodeLlama-Models
--batch_size_llm BATCH_SIZE_LLM: Batch Size used to obtain Model Activations from the LLM
--batch_size_autoencoder BATCH_SIZE_AUTOENCODER: Batch Size used to train the AutoEncoder
--num_tokens NUM_TOKENS: Amount of Tokens used to train AutoEncoder
--device_llm DEVICE_LLM: Device to load the LLM to
--device_autoencoder DEVICE_AUTOENCODER: Device to load the AutoEncoder to
--learning_rate LEARNING_RATE: Learning-Rate for AutoEncoder-Training.
--l1_coefficient L1_COEFFICIENT: L1-Coefficient or Sparsity-Coefficient for AutoEncoder-Training.
--act_vec_size ACT_VEC_SIZE: Size of the Activation-Vector inputted to the AutoEncoder.
--dict_vec_size DICT_VEC_SIZE: Size of the Dictionary-Vector produced by the AutoEncoder.
--batches_between_ckpt BATCHES_BETWEEN_CKPT: Number of Batches to train the AutoEncoder, before the Model is saved as a Checkpoint-File and a Feature-Frequencies-Image is generated.
--num_batches_preload NUM_BATCHES_PRELOAD: Buffer-Size of Activation-Vectors for Training. If Buffer is empty, will be refilled. Larger Buffer results in higher Randomness while Training.
--neuron_resampling_method NEURON_RESAMPLING_METHOD: Strategy for Neuron-Resampling. Currently available: ‘replacement’, ‘anthropic’
--neuron_resampling_interval NEURON_RESAMPLING_INTERVAL: Amount of Batches (AutoEncoder-sized) to train, until Neuron-Resampling
--normalize_dataset: If activated, all Activation Vectors in the Dataset will be normalized to L2-Norm of sqrt(n) (with n being input dimension of AutoEncoder)
--mlp_activations_hookpoint MLP_ACTIVATIONS_HOOKPOINT: Hookpoint description for MLP-Activations. e.g. model.layers.{}.mlp.act_fn ({} for Layer Index)
--mlp_sublayer_hookpoint MLP_SUBLAYER_HOOKPOINT: Hookpoint description for MLP-Sublayer. e.g. model.layers.{}.mlp ({} for Layer Index)
--attn_sublayer_hookpoint ATTN_SUBLAYER_HOOKPOINT: Hookpoint description for Attention-Sublayer. e.g. model.layers.{}.self_attn ({} for Layer Index)
layer_id: ID of Layer from which the Activations are obtained
layer_type: Type of Layer from which the Activations are collected. Select ‘attn_sublayer’, ‘mlp_sublayer’ or ‘mlp_activations’.

Example:

python3 train_autoencoder_from_tokens.py --num_batches 50000 --dataset_path ~/tokenized_dataset/ --save_path ~/autoencoders/l19_lr2e-4_spar0.5 --batch_size_llm 16 --batch_size_autoencoder 1024 --num_tokens 64 --device_llm cuda:0 --device_autoencoder cuda:1 --learning_rate 0.0002 --batches_between_ckpt 5000 --num_batches_preload 5000 19 mlp_activations

Generate Interpretation Samples

This Script samples Activations of the Neurons of the Sparse Autoencoder and generates heuristics on their Activation.

Parameters:

--dataset_path DATASET_PATH: Path of Tokenized Dataset to use for Obtaining Interpretation Samples
--autoencoder_path AUTOENCODER_PATH: Path of Autoencoder Model to analyze
--save_path SAVE_PATH: Path to save the Interpretation Samples to
--target_model_name TARGET_MODEL_NAME: Name of Target-Model. Currently, only CodeLlama-Models are supported
--target_model_device TARGET_MODEL_DEVICE: Device of Target Model
--autoencoder_device AUTOENCODER_DEVICE: Device of Autoencoder
--log_freq_upper LOG_FREQ_UPPER: Maximal Log-Feature-Frequency at which a Feature should be interpreted
--log_freq_lower LOG_FREQ_LOWER: Minimal Log-Feature-Frequency at which a Feature should be interpreted
num_samples: Number of Interpretation Samples to obtain

Comment on log_freq_upper and log_freq_lower: The Training Process of an SAE produces the Model and also Frequency-Histograms. Features, that appear very often/rare might not be of interest to be interpreted and can be excluded from Interpretation (very resource-intensive). The Values of log_freq_upper and log_freq_lower refer to \(\log_{10}(\text{Probability of Activation of Neuron})\). So if log_freq_upper is \(-0.1\), Features that have a higher probability to activate, than \(10^{-0.1}\) are discarded.

Example:

python3 generate_interpretation_samples.py --dataset_path ~/tokenized_dataset/ --autoencoder_path ~/l19_lr2e-4_spar0.5/50000.pt --save_path ~/interp_samples_l19.pt --target_model_device cuda:3 --autoencoder_device cuda:2 10000

Interpret Autoencoder

This Script interprets the Sparse Autoencoder. Only startable with Deepspeed!

Parameters:

--dataset_path DATASET_PATH: Path of Tokenized Dataset to use for Obtaining Interpretation Samples
--interpretation_samples_path INTERPRETATION_SAMPLES_PATH: Path of the saved Interpretation-Samples
--interpretation_model_name INTERPRETATION_MODEL_NAME: Name of Interpretation-Model. Currently, only CodeLlama-Models are supported
--num_gpus NUM_GPUS: Number of GPUs to use
--autoencoder_path AUTOENCODER_PATH: Path to Autoencoder to interpret
--num_interpretation_samples NUM_INTERPRETATION_SAMPLES: Number of Interpretation Samples to use
--num_simulation_samples NUM_SIMULATION_SAMPLES: Number of Simulation Samples to use
--local_rank LOCAL_RANK: Local Rank
--ssl_cert SSL_CERT: Path to SSL-Cert of ElasticSearch
server_address: Address to ElasticSearch-Server (e.g. https://DOMAIN:PORT)
api_key: API-Key to ElasticSearch-Server

Example:

deepspeed interpret_autoencoder_deepspeed.py --dataset_path ~/tokenized_dataset --interpretation_samples_path ~/interp_samples_l19.pt --num_gpus 4 --autoencoder_path ~/l19_lr2e-4_spar0.5/50000.pt 127.0.0.1:1234 API_KEY