This is Niraj! I’m currently a software engineer at Goldman Sachs. I’ll be documenting my notes/learnings/muses, because unlike popular opinion, I love documentation. Don’t @ me.
NOTE: YET TO BE UPDATED TO THE LATEST.
Professional Experience
Riverus Technology Solutions Pvt. Ltd. (July 2018 till date)
“Riverus” is a software product company in legal domain focused on applying machine learning to legal research tool. In capacity of data scientist, I report to product manager. I am responsible for
- extracting high value analytics from Legal Corpus
- providing reliable and concurrent APIs based on ML models
- building tools using neural networks to solve text-based captchas
- building robust machine learning pipelines for faster throughput
Projects
Final Year Project
Understanding and Analysis of Video and Image Advertisements (July 2018)
- Developed deep learning algorithms to assess the effectiveness of an advertisement and it’s emotional impact on the audience.
- Used a public domain dataset – Youtube 8M video dataset and concocted a useful set of features to achieve our goal.
- We won the best project award in a competition organized by TCS.
Personal Projects
Identifying Customer Segments (Oct 2019)
- Applied unsupervised learning techniques to identify segments of general population that could be converted into potential customers.
- Performed extensive exploratory data analysis
- Trained a model using convolution neural networks to solve text-based captchas eventually helping automate the dependent process.
- Streamlined the process of gathering and annotating data
- Fine tuned the convolution network to give 95% accuracy
Recommendation Engine for IBM (Oct 2019)
- Based on user behavior and social network, a recommendation engine was built to surface content most likely relevant to a user
- Knowledge based, content based and collaborative filtering
- Matrix factorization
- Became aware of the cold-start problems in recommendation algorithms
- Learnt tactics for assessing the effectiveness of recommendation engines
- Predicting whether a customer is going to churn based on his/her behavior
- Data provided is a simulated data from an imaginary streaming service
- Evaluation of the predictive model was done using F1 score.
Professional Projects
Optical Character Recognition for PDF documents (Aug 18 - Jan 19)
- Built an OCR pipeline for converting scanned images to PDF for further processing.
- Used Google’s tesseract for converting scanned image into text
- Developed a tool for removing watermarks from the PDF
Keyphrase Clustering (Nov 18- Jan 19)
- Keyphrase clustering is used for auto-complete feature while searching for Legal cases and indirectly for identifying similar cases.
- Trained a Word2Vec model on the legal corpus to make the machine understand Legal context.
- Used the trained model to cluster key phrases into 800 buckets.
- Built a tool to categorize the incoming new key phrases into the said buckets
Issue Sentence Similarity (Nov 18 - Jan 19)
- Worked on a proof of concept to achieve issue sentence similarity.
- Created sentence embeddings using Term Frequency and Inverse Document Frequency
- Fine-tuned the sentence embeddings to determine similar issue sentences and achieved a satisfactory result
Named Entity Recognition (May 19 - Jul 19)
- NER model is used to extract valuable information from text like Sections, Acts, Judges and Lawyers.
- Trained NER using Stanford’s CRF algorithm
- Fine-tune the hyper-parameters to suit the corporate needs
Co-curricular activities
- Organized a workshop for 50+ students which gives them basic overview of how to use Tensorflow using a toy project – Digit Recognizer.
- Designed a website using PHP, HTML and CSS for a Rubik’s Cube Club that I had founded in my second year for organizing competitions and workshops.
- Smart India Hacakthon (Qualified for the finals)
- Infosys Hackathon (Second runners-up)