CV

Applied Scientist at Microsoft with expertise in ML for information retrieval, computer vision, and NLP. IIT Jodhpur alumnus (2023).

Contact Information

Name Akarsh Upadhyay
Professional Title Applied Scientist
Email akarshupadhyayabc@gmail.com

Professional Summary

Applied Scientist at Microsoft working on ML for ad retrieval and monetization. IIT Jodhpur alumnus with experience in information retrieval, computer vision, and NLP.

Experience

  • 2024 - present

    India

    Applied Scientist
    Microsoft (MSAN - Microsoft Audience Network)
    • Intent-Based Retrieval: Led development of next-generation encoder models for ad retrieval. Achieved +14.32 absolute precision points (P@100) over production baseline using 30x less training data.
    • A/B Testing Impact: +0.61% CTR and +0.51% revenue lift in NA region. Methodology adopted as standard by US and India MSAN teams.
    • Unified Evaluation Framework: Designed single source-of-truth evaluation framework across IDC, STCA, and US MSAN teams, supporting encoder-based, Single/Multi-Intent, Generative Retrieval, and ANN variants.
    • High-Impact Index for Product Ads: ML-driven retrieval system identifying high-impact product offers; replaced legacy system, delivering 2.77% revenue lift.
  • 2024 - 2024

    India

    Machine Learning Engineer
    Zomato
    • Image Quality Score: Fine-tuned ResNet-50 for food image quality classification (F1: 90%). Now a critical component evaluating nearly every food image shown to users.
    • Ads Creation: Designed automated ad creation system using generative models for background generation and brand-specific styling.
    • Photo Cake: Built real-time image overlay system using OpenCV; launched on Mother’s Day resulting in 3,000+ photo cake sales across India.
  • 2023 - 2023

    India

    ML Research Intern
    Enterpret
    • Text Similarity at Scale: Led project to assess semantic similarity between sentences based on business value. Scaled solution to 1M+ texts, achieving F1 Score of 85%.
    • Explored prompt engineering, transformer fine-tuning, novel loss functions, and LLMs for optimal results.
    • Created comprehensive data preparation guidelines (20+ pages) and supervised the annotation team.

Education

  • 2019 - 2023

    Jodhpur, India

    B.Tech
    Indian Institute of Technology (IIT) Jodhpur
    Electrical Engineering
    • Core Member of the Robotics Club - participated in AI/ML competitions and hackathons.
    • Research on Document Layout Understanding under Dr. Santanu Chaudhary.
    • Project on Medical Visual Question Answering using NLP + Computer Vision.

Publications

  • 2024
    GDP: Generic Document Pretraining to Improve Document Understanding
    18th International Conference on Document Analysis and Recognition (ICDAR)

    A generic document pretraining approach that improves document understanding across various downstream tasks.

Projects

  • Document Layout Understanding

    Multi-modal transformer (DocFormer) for Visual Document Understanding (VDU) in English and multilingual settings. Guided by Dr. Santanu Chaudhary, IIT Jodhpur.

    • Applied DocFormer architecture for document understanding tasks.
    • Extended to multilingual document settings.
  • Medical Visual Question Answering

    VQA system for X-ray/MRI scans (brain, kidney, lungs) using NLP and Computer Vision (TensorFlow). Achieved ~90% accuracy on the test set.

    • Fused visual attention with NLP for multi-modal QA.
    • Trained on medical imaging datasets achieving 90% test accuracy.

Skills

Machine Learning & AI (Expert): Encoder Models, Representation Learning, Information Retrieval, Transformers, LLMs, Prompt Engineering, Fine-tuning, Generative AI
Computer Vision (Advanced): Image Classification, ResNet, CNN, OpenCV, Image Quality Assessment, Visual Document Understanding
Tools & Frameworks (Expert): PyTorch, TensorFlow, Python, Git, GPT-4

Languages

English : Fluent
Hindi : Native

Interests

Research Interests: Information Retrieval, Representation Learning, Data Quality, Document Understanding, Multimodal AI