# Training Scripts #### Generate Tokenized Dataset This Script generates a Tokenized Dataset from a HuggingFace-Dataset and stores it on the disk. Parameters: * `--save_path SAVE_PATH`: Path to save-directory for the generated Dataset * `--min_tokens MIN_TOKENS`: Number of minimum Tokens one Context has to contain in order to be used in the generated Dataset * `--num_processes NUM_PROCESSES`: Number of concurrent Processes for downloading Contexts * `--tokenizer_name TOKENIZER_NAME`: HuggingFace-Name for the Tokenizer for tokenizing the Contexts. Currently only works for CodeLlama-Tokenizers * `--dataset_name DATASET_NAME`: HuggingFace-Name for Dataset to load data from * `num_files`: Number of Files to generate. If Multiprocessing is used: Has to be multiple of num_processes * `num_samples_per_file`: Number of tokenized Contexts per File Example: ```bash python3 generate_tokenized_dataset.py --save_path ~/llama_tokenized_dataset --min_tokens 256 --num_processes 16 16 10000 ``` Generates a Tokenized Dataset of Samples, containing at least 256 Tokens with 16 Processes. Saves this Dataset into 16 Files, each containing 10000 Tokenized Contexts. #### Train Autoencoder This Script trains a Sparse Autoencoder Model. Parameters: * `--num_batches NUM_BATCHES`: Amount of Batches (AutoEncoder-sized) used to train the AutoEncoder on * `--dataset_path DATASET_PATH`: Path to the Pretokenized Dataset * `--save_path SAVE_PATH`: Path to save the trained AutoEncoders to * `--model_name MODEL_NAME`: HuggingFace-Name for the Model for obtaining Activations. Currently only works for CodeLlama-Models * `--batch_size_llm BATCH_SIZE_LLM`: Batch Size used to obtain Model Activations from the LLM * `--batch_size_autoencoder BATCH_SIZE_AUTOENCODER`: Batch Size used to train the AutoEncoder * `--num_tokens NUM_TOKENS`: Amount of Tokens used to train AutoEncoder * `--device_llm DEVICE_LLM`: Device to load the LLM to * `--device_autoencoder DEVICE_AUTOENCODER`: Device to load the AutoEncoder to * `--learning_rate LEARNING_RATE`: Learning-Rate for AutoEncoder-Training. * `--l1_coefficient L1_COEFFICIENT`: L1-Coefficient or Sparsity-Coefficient for AutoEncoder-Training. * `--act_vec_size ACT_VEC_SIZE`: Size of the Activation-Vector inputted to the AutoEncoder. * `--dict_vec_size DICT_VEC_SIZE`: Size of the Dictionary-Vector produced by the AutoEncoder. * `--batches_between_ckpt BATCHES_BETWEEN_CKPT`: Number of Batches to train the AutoEncoder, before the Model is saved as a Checkpoint-File and a Feature-Frequencies-Image is generated. * `--num_batches_preload NUM_BATCHES_PRELOAD`: Buffer-Size of Activation-Vectors for Training. If Buffer is empty, will be refilled. Larger Buffer results in higher Randomness while Training. * `--neuron_resampling_method NEURON_RESAMPLING_METHOD`: Strategy for Neuron-Resampling. Currently available: 'replacement', 'anthropic' * `--neuron_resampling_interval NEURON_RESAMPLING_INTERVAL`: Amount of Batches (AutoEncoder-sized) to train, until Neuron-Resampling * `--normalize_dataset`: If activated, all Activation Vectors in the Dataset will be normalized to L2-Norm of sqrt(n) (with n being input dimension of AutoEncoder) * `--mlp_activations_hookpoint MLP_ACTIVATIONS_HOOKPOINT`: Hookpoint description for MLP-Activations. e.g. model.layers.{}.mlp.act_fn ({} for Layer Index) * `--mlp_sublayer_hookpoint MLP_SUBLAYER_HOOKPOINT`: Hookpoint description for MLP-Sublayer. e.g. model.layers.{}.mlp ({} for Layer Index) * `--attn_sublayer_hookpoint ATTN_SUBLAYER_HOOKPOINT`: Hookpoint description for Attention-Sublayer. e.g. model.layers.{}.self_attn ({} for Layer Index) * `layer_id`: ID of Layer from which the Activations are obtained * `layer_type`: Type of Layer from which the Activations are collected. Select 'attn_sublayer', 'mlp_sublayer' or 'mlp_activations'. Example: ```bash python3 train_autoencoder_from_tokens.py --num_batches 50000 --dataset_path ~/tokenized_dataset/ --save_path ~/autoencoders/l19_lr2e-4_spar0.5 --batch_size_llm 16 --batch_size_autoencoder 1024 --num_tokens 64 --device_llm cuda:0 --device_autoencoder cuda:1 --learning_rate 0.0002 --batches_between_ckpt 5000 --num_batches_preload 5000 19 mlp_activations ``` #### Generate Interpretation Samples This Script samples Activations of the Neurons of the Sparse Autoencoder and generates heuristics on their Activation. Parameters: * `--dataset_path DATASET_PATH`: Path of Tokenized Dataset to use for Obtaining Interpretation Samples * `--autoencoder_path AUTOENCODER_PATH`: Path of Autoencoder Model to analyze * `--save_path SAVE_PATH`: Path to save the Interpretation Samples to * `--target_model_name TARGET_MODEL_NAME`: Name of Target-Model. Currently, only CodeLlama-Models are supported * `--target_model_device TARGET_MODEL_DEVICE`: Device of Target Model * `--autoencoder_device AUTOENCODER_DEVICE`: Device of Autoencoder * `--log_freq_upper LOG_FREQ_UPPER`: Maximal Log-Feature-Frequency at which a Feature should be interpreted * `--log_freq_lower LOG_FREQ_LOWER`: Minimal Log-Feature-Frequency at which a Feature should be interpreted * `num_samples`: Number of Interpretation Samples to obtain Comment on `log_freq_upper` and `log_freq_lower`: The Training Process of an SAE produces the Model and also Frequency-Histograms. Features, that appear very often/rare might not be of interest to be interpreted and can be excluded from Interpretation (very resource-intensive). The Values of `log_freq_upper` and `log_freq_lower` refer to $\log_{10}(\text{Probability of Activation of Neuron})$. So if `log_freq_upper` is $-0.1$, Features that have a higher probability to activate, than $10^{-0.1}$ are discarded. Example: ```bash python3 generate_interpretation_samples.py --dataset_path ~/tokenized_dataset/ --autoencoder_path ~/l19_lr2e-4_spar0.5/50000.pt --save_path ~/interp_samples_l19.pt --target_model_device cuda:3 --autoencoder_device cuda:2 10000 ``` #### Interpret Autoencoder This Script interprets the Sparse Autoencoder. **Only startable with Deepspeed!** Parameters: * `--dataset_path DATASET_PATH`: Path of Tokenized Dataset to use for Obtaining Interpretation Samples * `--interpretation_samples_path INTERPRETATION_SAMPLES_PATH`: Path of the saved Interpretation-Samples * `--interpretation_model_name INTERPRETATION_MODEL_NAME`: Name of Interpretation-Model. Currently, only CodeLlama-Models are supported * `--num_gpus NUM_GPUS`: Number of GPUs to use * `--autoencoder_path AUTOENCODER_PATH`: Path to Autoencoder to interpret * `--num_interpretation_samples NUM_INTERPRETATION_SAMPLES`: Number of Interpretation Samples to use * `--num_simulation_samples NUM_SIMULATION_SAMPLES`: Number of Simulation Samples to use * `--local_rank LOCAL_RANK`: Local Rank * `--ssl_cert SSL_CERT`: Path to SSL-Cert of ElasticSearch * `server_address`: Address to ElasticSearch-Server (e.g. https://DOMAIN:PORT) * `api_key`: API-Key to ElasticSearch-Server Example: ```bash deepspeed interpret_autoencoder_deepspeed.py --dataset_path ~/tokenized_dataset --interpretation_samples_path ~/interp_samples_l19.pt --num_gpus 4 --autoencoder_path ~/l19_lr2e-4_spar0.5/50000.pt 127.0.0.1:1234 API_KEY ```