PhiUSIIL Phishing URL (Website)

Donated on 3/3/2024

PhiUSIIL Phishing URL Dataset is a substantial dataset comprising 134,850 legitimate and 100,945 phishing URLs. Most of the URLs we analyzed, while constructing the dataset, are the latest URLs. Features are extracted from the source code of the webpage and URL. Features such as CharContinuationRate, URLTitleMatchScore, URLCharProb, and TLDLegitimateProb are derived from existing features.

Dataset Characteristics

Tabular

Subject Area

Computer Science

Associated Tasks

Classification

Feature Type

Real, Categorical, Integer

# Instances

235795

# Features

54

Dataset Information

What do the instances in this dataset represent?

URLs and their corresponding webpages

Has Missing Values?

No

Introductory Paper

Variables Table

Variable NameRoleTypeDescriptionUnitsMissing Values
FILENAMEOtherCategoricalno
URLFeatureCategoricalno
URLLengthFeatureIntegerno
DomainFeatureCategoricalno
DomainLengthFeatureIntegerno
IsDomainIPFeatureIntegerno
TLDFeatureCategoricalno
URLSimilarityIndexFeatureIntegerno
CharContinuationRateFeatureIntegerno
TLDLegitimateProbFeatureContinuousno

0 to 10 of 56

Additional Variable Information

Column "FILENAME" can be ignored.

Class Labels

Label 1 corresponds to a legitimate URL, label 0 to a phishing URL

Reviews

There are no reviews for this dataset yet.

Login to Write a Review
Download
1 citations
19138 views

Creators

Arvind Prasad

arvindbitm@gmail.com

Babashaheb Bhimrao Ambedkar University

Shalini Chandra

Babashaheb Bhimrao Ambedkar University

License

By using the UCI Machine Learning Repository, you acknowledge and accept the cookies and privacy practices used by the UCI Machine Learning Repository.

Read Policy