Zama: Part 3.1, Concrete ML - No Code Introduction

สวัสดีครับเพื่อนๆ

หลังจากที่ผู้เขียนได้อัปเดตเกี่ยวโปรเจค Zama ในมุมมองด้าน Blockchain ไปแล้ว ผ่านบทความ Part 2.1 และ 2.2 สำหรับบทความนี้ ในฐานะที่ผู้เขียนกำลังทำงานด้าน Data Scientist อยู่ ผู้เขียนมีความสนใจ product ของตัวนึง Zama มากๆ ที่ช่วยให้เหล่า Data Scientist สามารถสร้างโมเดล AI ที่สามารถประมวลผลบนข้อมูลที่ถูกเข้ารหัสได้ ทำให้การทำงานมีความปลอดภัยสูงแม้ข้อมูลจะหลุดรั่วออกไป โดย product ตัวนี้ชื่อว่า Concrete ML ครับ

Introduction

จุดแข็งของ Fully Homomorphic Encryption (FHE) คือการที่สามารถทำ computations บน encrypted data ได้โดยที่ไม่จำเป็นต้อง decrypt ก่อน จึงไม่ต้องกังวลว่าข้อมูลที่ sensitive จะหลุดออกไป โดยส่วนของ Concrete ML จะเป็น open source, privacy-preserving, machine learning framework ผ่าน FHE โดยมีคุณสมบัติต่างๆเช่น

สามารถที่จะใช้งานผ่าน APIs ในรูปแบบที่ใกล้เคียงกับ library มาตรฐาน (scikit-learn และ PyTorch) ในการทำงานกับ Machine Learning models ในรูปแบบ FHE equivalent
สามารถที่จะ train linear models รวมถึง fine-tune LLMs บน encrypted data ได้
สามารถ pre-process encrypted data ในรูปแบบ DataFrame ได้ (แต่ยังมีข้อจำกัดประมาณนึง)

โดยเหล่า Data Scientist ที่มาใช้งาน library นี้ไม่จำเป็นที่จะต้องมีความรู้เกี่ยวกับ cryptography เลยก็ได้ (Because Vibe Coding? Yes and No เพราะ concept ของ FHE นั้นสามารถทำความเข้าใจได้ง่ายมากๆ แต่ความซับซ้อนส่วนใหญ่จะไปตกอยู่กับการคำนวณหลังบ้าน)

Cryptography Concepts, Again

ผู้อ่านบ้าน Web3 ควรมีความเข้าใจพื้นฐานในเรื่องของ Encryption, Decryption, Private/Public Key อยู่แล้ว แต่ผู้เขียนคิดว่ามีส่วนของ TFHE ที่สามารถเสริมเลยได้ในเบื้องต้น ก่อนที่จะเขียนบทความถัดไป พื้นฐานของการทำ FHE จะต้อง preserve การบวกและการคูณบน ciphertexts (ข้อมูลที่ถูก encrypted) โดยไอเดียหลักๆของ TFHE คือการจัดการ noise บน ciphertexts ที่ถูกเพิ่มเข้ามาเพื่อเพิ่ม security ในการ encryption ที่ noise จะเริ่มมีความเสี่ยงที่จะ explode บนการคูณ

สวัสดีครับเพื่อนๆ

Introduction

สามารถที่จะใช้งานผ่าน APIs ในรูปแบบที่ใกล้เคียงกับ library มาตรฐาน (scikit-learn และ PyTorch) ในการทำงานกับ Machine Learning models ในรูปแบบ FHE equivalent
สามารถที่จะ train linear models รวมถึง fine-tune LLMs บน encrypted data ได้
สามารถ pre-process encrypted data ในรูปแบบ DataFrame ได้ (แต่ยังมีข้อจำกัดประมาณนึง)

Cryptography Concepts, Again

Training: โมเดลถูกเทรนบนข้อมูลที่เป็น plaintext รวมถึง encrypted ไว้แล้ว
Quantization: ทำให้ข้อมูล inputs, model weights และ intermediate values ของการทำ inference นั้นอยู่ในรูป integer equivalent โดยจะมีสองรูปแบบขึ้นกับประเภทโมเดลที่ใช้
- During Training (Quantization Aware Training): เป็นการเพิ่ม quantization layers ในโมเดล NN โดย weights สามารถเป็น discrete และ activation quantization parameters ถูก optimized ผ่าน gradient descent หากใช้ QAT จะต้องมีการ re-training NN บน quantization layers.
- After Training (Post Training Quantization): floating point neural network ถูกคงไว้และขั้นตอนการ calibration step จะกำหนด quantization parameters สำหรับแต่ละ layer ซึ่งไม่จำเป็นต้อง re-training จึงไม่จำเป็นต้องมี training data หรือ labels สำหรับการแปลง NN เป็นรูปแบบ FHE ผ่านการใช้ PTQ
Simulation: สามารถ execute โมเดลที่ถูก quantized เพื่อวัด accuracy ใน FHE และประเมิน การ modifications ในการทำให้ FHE compatible
Compilation: หลังจาก quantizing model และมั่นใจว่าโมเดลมี good FHE accuracy ผ่าน simulation โมเดลจะต้องถูก compiled ผ่าน Concrete's FHE Compiler เพื่อสร้าง equivalent FHE circuit โดย circuit is represented as an MLIR program รวบรวม low level cryptographic operations ต่างๆไว้
Inference: โมเดลที่ถูก compiled แล้วจะถูก executed บน encrypted data หลังจาก proper keys ถูก generated ขึ้นแล้ว model สามารถที่จะถูก deployed ไปที่ server และใช้ในการ run private inference บน encrypted inputs

Training: โมเดลถูกเทรนบนข้อมูลที่เป็น plaintext รวมถึง encrypted ไว้แล้ว
Quantization: ทำให้ข้อมูล inputs, model weights และ intermediate values ของการทำ inference นั้นอยู่ในรูป integer equivalent โดยจะมีสองรูปแบบขึ้นกับประเภทโมเดลที่ใช้
- During Training (Quantization Aware Training): เป็นการเพิ่ม quantization layers ในโมเดล NN โดย weights สามารถเป็น discrete และ activation quantization parameters ถูก optimized ผ่าน gradient descent หากใช้ QAT จะต้องมีการ re-training NN บน quantization layers.
- After Training (Post Training Quantization): floating point neural network ถูกคงไว้และขั้นตอนการ calibration step จะกำหนด quantization parameters สำหรับแต่ละ layer ซึ่งไม่จำเป็นต้อง re-training จึงไม่จำเป็นต้องมี training data หรือ labels สำหรับการแปลง NN เป็นรูปแบบ FHE ผ่านการใช้ PTQ
Simulation: สามารถ execute โมเดลที่ถูก quantized เพื่อวัด accuracy ใน FHE และประเมิน การ modifications ในการทำให้ FHE compatible
Compilation: หลังจาก quantizing model และมั่นใจว่าโมเดลมี good FHE accuracy ผ่าน simulation โมเดลจะต้องถูก compiled ผ่าน Concrete's FHE Compiler เพื่อสร้าง equivalent FHE circuit โดย circuit is represented as an MLIR program รวบรวม low level cryptographic operations ต่างๆไว้
Inference: โมเดลที่ถูก compiled แล้วจะถูก executed บน encrypted data หลังจาก proper keys ถูก generated ขึ้นแล้ว model สามารถที่จะถูก deployed ไปที่ server และใช้ในการ run private inference บน encrypted inputs

More from lordachita

More from lordachita

1 comment

lordachita

More from lordachita

1 comment

More from lordachita

Zama: Part 3.1, Concrete ML - No Code Introduction

Zama: Part 3.1, Concrete ML - No Code Introduction

1 comment

1 comment

Introduction

Cryptography Concepts, Again

Introduction

Cryptography Concepts, Again

Concrete

Welcome | Concrete

Overview

0. Data Scientist Methodology

1. Current Limitations

2. On-Hands

Titanic - Machine Learning from Disaster

3. [Geek Alert!] Inference in the Cloud

4. [Geek Alert!] Concrete ML Model Life Cycle

Production deployment | Concrete ML

Conclusion

Reference

Concrete

Welcome | Concrete

Overview

0. Data Scientist Methodology

1. Current Limitations

2. On-Hands

Titanic - Machine Learning from Disaster

3. [Geek Alert!] Inference in the Cloud

4. [Geek Alert!] Concrete ML Model Life Cycle

Production deployment | Concrete ML

Conclusion

Reference