Profile photo placeholder

Saud Khan

Data Science undergraduate with hands-on experience in machine learning, AI system development, and backend engineering.

About Me

Data Science undergraduate with hands-on experience in machine learning, AI system development, and backend engineering. Seeking opportunities to apply analytical and technical expertise to real-world problems while contributing to data-driven and AI-powered solutions in a professional environment.

Education

Ghulam Ishaq Khan Institute of Engineering Sciences and Technology (GIKI)
BS Data Science — CGPA: 3.13 / 4.00

Experience

NESL-IT — Data Scientist Intern

  • Developed and evaluated machine learning models using preprocessing and feature engineering techniques.
  • Performed exploratory data analysis and visualization to derive actionable insights.

IBM Pakistan — Software Developer

Worked closely with employees on AI-driven and backend engineering initiatives.

  • Designed and implemented RESTful backend APIs using Node.js for scalable applications.
  • Supported AI-driven automation and optimized data workflows for improved system efficiency.

Projects

SpeechEcho — Real-Time Voice Cloning & Speech Synthesis

Developed a real-time voice cloning and conversational speech synthesis system using PyTorch and XTTS-v2, with optimized preprocessing and inference pipelines for low-latency, high-quality synthesis. Trained on a custom South Asian accent dataset to help South Asian callers better understand customer support voices during phone conversations, addressing accent barriers and improving communication clarity.

  • PyTorch
  • XTTS-v2
  • Real-Time Inference

Retrieval-Augmented Generation System

LangChain, FAISS

Built an end-to-end RAG pipeline for research-paper-based question answering using LLMs, integrating a FAISS vector database for efficient semantic search and retrieval. Used Grafana and Prometheus to monitor query accuracy and analyze hallucination rates.

  • LangChain
  • FAISS
  • LLMs

Plant Disease Detection

CNN, VGG16

Implemented transfer learning using VGG16 with data augmentation for accurate plant disease classification, and evaluated performance using precision, recall, and confusion matrix analysis.

  • TensorFlow
  • CNN
  • Computer Vision

New York Housing Price Prediction

Performed data cleaning, feature engineering, and exploratory data analysis on the New York housing dataset, building and comparing regression models to predict housing prices with rigorous performance evaluation.

  • Data Analysis
  • Regression
  • Python

Skills

Programming

Python, Node.js, SQL

Machine Learning & AI

Deep Learning, CNNs, RNNs, Transformers, NLP, LLMs

Frameworks

PyTorch, TensorFlow, Scikit-learn

Cloud & Big Data

AWS, Microsoft Azure, Spark, Hadoop, Kafka

Other

Data Analysis, Model Training, API Development

Languages

Urdu (Native), English (Fluent)

Certifications & Awards

Certifications

  • Big Data (Spark, Scala, Kafka, Hadoop) — Udemy (2025)
  • Foundations: Data, Data, Everywhere — Google (2024)

Honors & Awards

  • Dean Honor List — FCSE, GIKI (2024)
  • 3rd Position in Computer Science — BISE Mardan (2022)