🔍 French Influencer Monitor

A comprehensive Python web application that monitors French influencers by scraping public online data and classifying them based on controversies, positive actions, and overall trustworthiness.

🌟 Features

Multi-Source Scraping: Parallel scraping from:
- 📰 French news sites (Le Monde, Le Figaro, etc.)
- 🎥 YouTube
- 🐦 Twitter/X
- 💬 Reddit
- 🗨️ French forums
AI-Powered Analysis:
- French sentiment analysis using CamemBERT
- Automatic classification: Drama, Good Action, or Neutral
- Keyword-based detection for controversies and positive actions
Trust Scoring System:
- 0-100 trust score calculation
- Weighted by recency (recent events matter more)
- Based on drama/good action ratio and sentiment
Interactive UI:
- Real-time search and analysis
- Visual dashboards with charts
- Detailed mention breakdown
- Source citations with links

🚀 Installation

Prerequisites

Python 3.8+
pip

Setup

Clone or navigate to the project directory:

cd "/Users/roane/roane/perso/hackathon blackbox"

Create a virtual environment:

python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Configure environment (optional):

cp .env.example .env
# Edit .env with your API keys if needed

🎯 Usage

Run the Streamlit App

streamlit run streamlit_app.py

The app will open in your browser at http://localhost:8501

How to Use

Enter a French influencer's name (e.g., "Squeezie", "Norman", "Cyprien")
Click "🔍 Analyser"
Wait for the parallel scraping and analysis to complete
View the results:
- Trust score (0-100)
- Controversy count
- Positive action count
- Detailed mentions with sources
- Visual charts and breakdowns

Using Cache

Enable "Utiliser le cache" in the sidebar to use previously analyzed data instead of re-scraping.

📊 Architecture

├── streamlit_app.py          # Main Streamlit UI
├── orchestrator.py            # Parallel execution coordinator
├── analyzer.py                # Sentiment analysis (CamemBERT)
├── scorer.py                  # Trust score calculation
├── database.py                # SQLite database management
├── config.py                  # Configuration and keywords
├── scrapers/
│   ├── base_scraper.py       # Base scraper class
│   ├── news_scraper.py       # News sites scraper
│   ├── youtube_scraper.py    # YouTube scraper
│   ├── twitter_scraper.py    # Twitter/X scraper
│   ├── reddit_scraper.py     # Reddit scraper
│   └── forum_scraper.py      # French forums scraper
└── requirements.txt           # Python dependencies

🧠 How It Works

1. Parallel Scraping

All scrapers run concurrently using asyncio to maximize speed:

Each scraper searches for the influencer's name
Extracts relevant text excerpts and URLs
Returns structured data

2. Sentiment Analysis

Uses cmarkea/distilcamembert-base-sentiment model
Analyzes French text for positive/negative sentiment
Combines with keyword matching for classification

3. Classification

Content is classified as:

Drama: Negative sentiment + controversy keywords (scandale, polémique, etc.)
Good Action: Positive sentiment + charity keywords (don, charité, etc.)
Neutral: Everything else

4. Trust Score Calculation

Base Score = 50
+ (good_actions × 10 × recency_weight)
- (dramas × 15 × recency_weight)
+ (avg_sentiment × 20)
Normalized to 0-100

Recency weight uses exponential decay: more recent mentions have higher impact.

🔧 Configuration

Edit config.py to customize:

Sentiment model
Drama/good action keywords
News sources
Scraping parameters

📝 Database

SQLite database stores:

Influencers: Name, trust score, counts, timestamps
Mentions: All scraped data with sentiment labels
Analysis History: Historical trust scores

⚠️ Limitations

Web scraping depends on site availability and structure
Some platforms (Twitter) may require API keys for better results
Sentiment analysis accuracy depends on text quality
Rate limiting may slow down scraping

🛠️ Troubleshooting

Model Download Issues

If the sentiment model fails to download:

# Pre-download the model
python -c "from transformers import AutoTokenizer, AutoModelForSequenceClassification; AutoTokenizer.from_pretrained('cmarkea/distilcamembert-base-sentiment'); AutoModelForSequenceClassification.from_pretrained('cmarkea/distilcamembert-base-sentiment')"

Scraping Errors

Check your internet connection
Some sites may block scrapers - this is normal
The app will continue with available data

📄 License

This project is for educational purposes (Hackathon).

🤝 Contributing

Built for the Blackbox Hackathon 2025.

🎓 Technologies Used

Backend: Python, asyncio, aiohttp
UI: Streamlit
Scraping: BeautifulSoup, Newspaper3k
AI: HuggingFace Transformers (CamemBERT)
Database: SQLite, SQLAlchemy
Visualization: Plotly

Made with ❤️ for the Hackathon Blackbox

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.blackbox/tmp		.blackbox/tmp
influencer-web		influencer-web
mobile-app		mobile-app
videos		videos
.gitignore		.gitignore
LEADERBOARD_IMPLEMENTATION.md		LEADERBOARD_IMPLEMENTATION.md
README.md		README.md
SUPABASE_SETUP_COMPLETE.sql		SUPABASE_SETUP_COMPLETE.sql
database.py		database.py
leaderboard.py		leaderboard.py
migrate_database.py		migrate_database.py
orchestrator.py		orchestrator.py
scrapers.py		scrapers.py
streamlit_app.py		streamlit_app.py
streamlit_app_backup.py		streamlit_app_backup.py
test_leaderboard.py		test_leaderboard.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🔍 French Influencer Monitor

🌟 Features

🚀 Installation

Prerequisites

Setup

🎯 Usage

Run the Streamlit App

How to Use

Using Cache

📊 Architecture

🧠 How It Works

1. Parallel Scraping

2. Sentiment Analysis

3. Classification

4. Trust Score Calculation

🔧 Configuration

📝 Database

⚠️ Limitations

🛠️ Troubleshooting

Model Download Issues

Scraping Errors

📄 License

🤝 Contributing

🎓 Technologies Used

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🔍 French Influencer Monitor

🌟 Features

🚀 Installation

Prerequisites

Setup

🎯 Usage

Run the Streamlit App

How to Use

Using Cache

📊 Architecture

🧠 How It Works

1. Parallel Scraping

2. Sentiment Analysis

3. Classification

4. Trust Score Calculation

🔧 Configuration

📝 Database

⚠️ Limitations

🛠️ Troubleshooting

Model Download Issues

Scraping Errors

📄 License

🤝 Contributing

🎓 Technologies Used

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages