As using chatGPTis API is becoming more and more expensive and number of tokens are limited there comes a point in your life that have to look for alternatives. Thats where Llama comes in!Alternatively you can use smaller models (3B parameters instead of 7B)Use bitsandbytes for 8-bit quantization, which reduces memory usage significantly.If you don’t have strong GPU can always outsource to cloud options that are out there like Google Colab, Hugging Face Inference API, RunPodAccessing Llama Mo...