Niraj Pandkar

This is Niraj! I’m currently a software engineer at Goldman Sachs. I’ll be documenting my notes/learnings/muses, because unlike popular opinion, I love documentation. Don’t @ me.

NOTE: YET TO BE UPDATED TO THE LATEST.

Professional Experience

Riverus Technology Solutions Pvt. Ltd. (July 2018 till date)

“Riverus” is a software product company in legal domain focused on applying machine learning to legal research tool. In capacity of data scientist, I report to product manager. I am responsible for

  • extracting high value analytics from Legal Corpus
  • providing reliable and concurrent APIs based on ML models
  • building tools using neural networks to solve text-based captchas
  • building robust machine learning pipelines for faster throughput

Projects

Final Year Project

Understanding and Analysis of Video and Image Advertisements (July 2018)

  • Developed deep learning algorithms to assess the effectiveness of an advertisement and it’s emotional impact on the audience.
  • Used a public domain dataset – Youtube 8M video dataset and concocted a useful set of features to achieve our goal.
  • We won the best project award in a competition organized by TCS.

Personal Projects

Identifying Customer Segments (Oct 2019)

  • Applied unsupervised learning techniques to identify segments of general population that could be converted into potential customers.
  • Performed extensive exploratory data analysis

Captcha Solver (Jan 2019)

  • Trained a model using convolution neural networks to solve text-based captchas eventually helping automate the dependent process.
  • Streamlined the process of gathering and annotating data
  • Fine tuned the convolution network to give 95% accuracy

Recommendation Engine for IBM (Oct 2019)

  • Based on user behavior and social network, a recommendation engine was built to surface content most likely relevant to a user
  • Knowledge based, content based and collaborative filtering
  • Matrix factorization
  • Became aware of the cold-start problems in recommendation algorithms
  • Learnt tactics for assessing the effectiveness of recommendation engines

Sparkify (March 2019)

  • Predicting whether a customer is going to churn based on his/her behavior
  • Data provided is a simulated data from an imaginary streaming service
  • Evaluation of the predictive model was done using F1 score.

Professional Projects

Optical Character Recognition for PDF documents (Aug 18 - Jan 19)

  • Built an OCR pipeline for converting scanned images to PDF for further processing.
  • Used Google’s tesseract for converting scanned image into text
  • Developed a tool for removing watermarks from the PDF

Keyphrase Clustering (Nov 18- Jan 19)

  • Keyphrase clustering is used for auto-complete feature while searching for Legal cases and indirectly for identifying similar cases.
  • Trained a Word2Vec model on the legal corpus to make the machine understand Legal context.
  • Used the trained model to cluster key phrases into 800 buckets.
  • Built a tool to categorize the incoming new key phrases into the said buckets

Issue Sentence Similarity (Nov 18 - Jan 19)

  • Worked on a proof of concept to achieve issue sentence similarity.
  • Created sentence embeddings using Term Frequency and Inverse Document Frequency
  • Fine-tuned the sentence embeddings to determine similar issue sentences and achieved a satisfactory result

Named Entity Recognition (May 19 - Jul 19)

  • NER model is used to extract valuable information from text like Sections, Acts, Judges and Lawyers.
  • Trained NER using Stanford’s CRF algorithm
  • Fine-tune the hyper-parameters to suit the corporate needs

Co-curricular activities

  • Organized a workshop for 50+ students which gives them basic overview of how to use Tensorflow using a toy project – Digit Recognizer.
  • Designed a website using PHP, HTML and CSS for a Rubik’s Cube Club that I had founded in my second year for organizing competitions and workshops.
  • Smart India Hacakthon (Qualified for the finals)
  • Infosys Hackathon (Second runners-up)