Building a resume-job matching service
(Part 1)
A personal project in web scraping, ML/NLP, and cloud deployment
Part 1 Summary
I introduce my project to implement a job search tool for job seekers which can automatically find and match the best job postings that fit with your resume. I'm going to discuss the project outline and expected results.
The problem
After I returned to the United States from my postdoctoral studies at the Max Planck Institute for Intelligent Systems, I faced both a job search and a lateral career change, from academic research to applied industry work in data science and machine learning. As one of the most popular job boards, LinkedIn quickly became my regular homepage to find and apply for new job listings in my area.
I quickly found, however, that manually searching through job listings on LinkedIn is not only time-consuming, but yields pretty poor results for jobs that match my experience and interests. Take a look at the image below to see the typical LinkedIn job listing UI:
There are several problems with the LinkedIn job search:
- Any given query can return 10s to 100s of results, which must be manually reviewed
- You can't view a job description until you click on the job card, requiring time and effort
- It takes valuable time to skim through each description to see if you meet the job requirements
- Other information, like the post date, is also obscured in the job card
- Query results span multiple pages that you have to click through
- Results can be littered with job board spam (jobs reposted to orgs like Jobot)
- Irrelevant jobs appear in the results (e.g. positions like Manager, Lead, etc. which I'm not looking for)
Theoretically, some of the problems with the returned results can be resolved using Boolean search terms, which some users claim work, but I have never been able to achieve this through the default LinkedIn search.
My approach
With all these problems in mind, I decided to create a job search and ranking app that simulatenously solves three problems: (1) reduce my time spent each day on job hunting, (2) find highly relevant jobs to my skills and experience, and (3) serve as a learning tool for my own professional development.
The approach is relatively simple. First, create an automated web scraper that can extract all of the job data (title, company, description, etc.) from a LinkedIn search query. Then, using the query results, apply machine learning and NLP tools to rank the jobs in comparison to a (my) resume, returning the most relevant jobs which I can than manually review and decide whether or not to apply for. Finally, to be useful beyond my own purposes, deploy these systems on AWS so that they are scalable and interactable through API calls.
Project outline
As I described my approach to this project, my blog posts will also cover each part in turn. I've outlined this blog into the following parts:
- Part 1: Project overview (you are here!)
- Part 2: Creating and testing a web scraper for LinkedIn job postings.
- Part 3: Using NLP and ML tools to match and rank jobs to a resume.
- Part 4: Deploying my systems on AWS cloud for scalability.
Full code for the project can be found on my github repository.
Let's begin in part 2!