Integrity

Ensuring integrity in hackathons by flagging suspicious activity.

github
  • 0 Raised
  • 256 Views
  • 0 Judges

Categories

  • HawkHacks Global Category

Gallery

Description

Links

Inspiration

Hackathons are an opportunity for amazing growth and development. Sadly, there exists people who do not believe what a hackathon stands for,  and who try to take advantage of the system by breaking hackathon guidelines. This can be seen by reused projects, faked repositories, and more. This undermines the hard work and efforts of most hackers, and overall destroys the integrity hackathons deserve. We aim to strengthen the trust within submitted projects by making a tool to detect red flags within submissions.

What it Does

Integrity, when given a devpost hackathon link, will check for suspicious activity within the submitted projects. It runs similarity scans on previously submitted projects to check for plagiarism, and looks through the GitHub for suspicious or malicious activity, such as having more contributors than stated, or commits being made before the hackathon start date. Integrity then flags these projects, at which stage they can be reviewed to ensure they are following guidelines. 

How we built it

We build Integrity using Python. We used BeautifulSoup to scrape the project descriptions and GitHub repositories from Devpost. We then took the descriptions we found and used the Gemini 1.5 AI model to run similarity tests. The frontend was made using Streamlit.

Challenges we ran into

A huge challenge we faced during development was figuring out a suitable LLM model to use when scanning similarity between projects. Since we were using free accounts and needed to run on large sets of data, the model quality and efficiency was of uptmost importance. Most of our team has not done web scraping before as well, so figuring out how to properly and efficiently fetch data was challenging.

Accomplishments that we're proud of

We're proud of developing an efficient web scraper where we were able to significantly reduce the time it takes to be able to scan all projects and users, reducing it to a matter of seconds. We're also proud of our use of LLMs to analyze the scraped data as our team had limited prior knowledge on using LLMs in our own projects.

Whats next

In the future, we would love to improve the speed of Integrity by using paid tokens instead of a free trial, as we are heavily rate limited. There are also other potential red flags that can be detected, and would be useful to show within Integrity.

Devposts



Attachments