Hawk Eye logo

Hawk Eye

Scan every data source for PII and secrets instantly

A powerful CLI tool that scans cloud storage, databases, and file systems for PII and secrets, supporting OCR, custom fingerprints, JSON reports, and Slack alerts.

Overview

Overview

Hawk-eye is a command‑line scanner designed for security teams, DevOps engineers, and compliance auditors who need to locate personally identifiable information and secret credentials across heterogeneous environments. It connects to cloud buckets (S3, GCS), relational and NoSQL databases (MySQL, PostgreSQL, MongoDB, CouchDB, Redis, Firebase), collaboration platforms (Slack, Google Drive) and local file systems, then inspects a wide range of file types—including PDFs, Office documents, images, archives, and video—using text analysis and OCR.

Features & Integration

The tool can be installed via pip, run as a Docker container, or imported as a Python library, giving flexibility for CI/CD pipelines or ad‑hoc investigations. Results are emitted as JSON and can be streamed to Slack for real‑time alerts. Advanced users can supply custom fingerprint patterns or adjust connection settings through YAML or inline JSON. While PostgreSQL scanning requires the optional psycopg2-binary package and some Linux distributions need extra graphics libraries, the core experience remains straightforward and scriptable.

Highlights

Scans 12+ sources including cloud storage, databases, and Slack
Detects PII and secrets in documents, images, archives, and video via OCR
CLI, Docker, and Python API with JSON output and debug mode
Real‑time Slack alerts and support for custom fingerprint patterns

Pros

  • Broad source coverage reduces need for multiple tools
  • Handles many file formats and OCR for hidden data
  • Easy installation via pip or Docker
  • Python API enables seamless pipeline integration

Considerations

  • Extra dependencies required for some databases (e.g., psycopg2-binary)
  • Operates via command line only; no native GUI
  • Alert integrations currently limited to Slack
  • Performance may vary on very large data sets without tuning

Managed products teams compare with

When teams consider Hawk Eye, these hosted platforms usually appear on the same shortlist.

Amazon Macie logo

Amazon Macie

Managed sensitive data discovery and protection for Amazon S3.

BigID logo

BigID

Data intelligence platform focused on data privacy, security, and governance through sensitive data discovery and classification

OneTrust logo

OneTrust

Unified trust platform for privacy, consent, data governance, and compliance automation.

Looking for a hosted option? These are the services engineering teams benchmark against before choosing open source.

Fit guide

Great for

  • Security teams needing automated PII discovery across heterogeneous environments
  • DevOps pipelines that require CI/CD scanning of code and assets
  • Compliance auditors verifying data privacy across cloud services
  • SMBs looking for a single tool to monitor secrets and personal data

Not ideal when

  • Organizations that require a full‑featured SIEM with correlation capabilities
  • Environments unable to install Python or Docker dependencies
  • Teams that depend on a native Windows GUI application
  • Use cases demanding real‑time streaming data inspection

How teams use it

Cloud storage compliance audit

Identify exposed PII in S3 and GCS buckets, generate a JSON report, and notify a Slack channel.

Database secret leakage detection

Scan MySQL, PostgreSQL, and MongoDB for hard‑coded credentials and personal data, then alert the security team.

CI/CD pipeline integration

Run Hawk-eye as a Docker step to fail builds when secrets are found in code or bundled assets.

File system forensics

Recursively scan local directories, archives, and PDFs for hidden PII, producing actionable findings.

Tech snapshot

Python100%
Dockerfile1%

Tags

infoseccybersecuritysecrets-managementgrcscannerauditingdatasecuritypiiaudit

Frequently asked questions

How do I provide connection details?

Create a connection.yml file with source credentials and mount it into the Docker container or pass it via --connection or --connection-json flags.

Can I customize the patterns it searches for?

Yes, you can supply a custom fingerprint.yml or inline fingerprint JSON to define regexes for additional data types.

What output formats are supported?

Results can be written to a JSON file using the --json flag or printed to stdout; the Python API returns Python objects.

Do I need extra packages for specific databases?

Scanning PostgreSQL requires the psycopg2-binary package, and Red Hat Linux may need mesa-libGL for the cv2 dependency.

Is there commercial support available?

Commercial support can be obtained via the project's LinkedIn, Twitter, or Slack community as noted in the README.

Project at a glance

Active
Stars
467
Watchers
467
Forks
55
Repo age2 years old
Last commitlast week
Primary languagePython

Last synced 12 hours ago