A comprehensive Python web application that monitors French influencers by scraping public online data and classifying them based on controversies, positive actions, and overall trustworthiness.
-
Multi-Source Scraping: Parallel scraping from:
- 📰 French news sites (Le Monde, Le Figaro, etc.)
- 🎥 YouTube
- 🐦 Twitter/X
- 🗨️ French forums
-
AI-Powered Analysis:
- French sentiment analysis using CamemBERT
- Automatic classification: Drama, Good Action, or Neutral
- Keyword-based detection for controversies and positive actions
-
Trust Scoring System:
- 0-100 trust score calculation
- Weighted by recency (recent events matter more)
- Based on drama/good action ratio and sentiment
-
Interactive UI:
- Real-time search and analysis
- Visual dashboards with charts
- Detailed mention breakdown
- Source citations with links
- Python 3.8+
- pip
- Clone or navigate to the project directory:
cd "/Users/roane/roane/perso/hackathon blackbox"- Create a virtual environment:
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies:
pip install -r requirements.txt- Configure environment (optional):
cp .env.example .env
# Edit .env with your API keys if neededstreamlit run streamlit_app.pyThe app will open in your browser at http://localhost:8501
- Enter a French influencer's name (e.g., "Squeezie", "Norman", "Cyprien")
- Click "🔍 Analyser"
- Wait for the parallel scraping and analysis to complete
- View the results:
- Trust score (0-100)
- Controversy count
- Positive action count
- Detailed mentions with sources
- Visual charts and breakdowns
Enable "Utiliser le cache" in the sidebar to use previously analyzed data instead of re-scraping.
├── streamlit_app.py # Main Streamlit UI
├── orchestrator.py # Parallel execution coordinator
├── analyzer.py # Sentiment analysis (CamemBERT)
├── scorer.py # Trust score calculation
├── database.py # SQLite database management
├── config.py # Configuration and keywords
├── scrapers/
│ ├── base_scraper.py # Base scraper class
│ ├── news_scraper.py # News sites scraper
│ ├── youtube_scraper.py # YouTube scraper
│ ├── twitter_scraper.py # Twitter/X scraper
│ ├── reddit_scraper.py # Reddit scraper
│ └── forum_scraper.py # French forums scraper
└── requirements.txt # Python dependencies
All scrapers run concurrently using asyncio to maximize speed:
- Each scraper searches for the influencer's name
- Extracts relevant text excerpts and URLs
- Returns structured data
- Uses
cmarkea/distilcamembert-base-sentimentmodel - Analyzes French text for positive/negative sentiment
- Combines with keyword matching for classification
Content is classified as:
- Drama: Negative sentiment + controversy keywords (scandale, polémique, etc.)
- Good Action: Positive sentiment + charity keywords (don, charité, etc.)
- Neutral: Everything else
Base Score = 50
+ (good_actions × 10 × recency_weight)
- (dramas × 15 × recency_weight)
+ (avg_sentiment × 20)
Normalized to 0-100
Recency weight uses exponential decay: more recent mentions have higher impact.
Edit config.py to customize:
- Sentiment model
- Drama/good action keywords
- News sources
- Scraping parameters
SQLite database stores:
- Influencers: Name, trust score, counts, timestamps
- Mentions: All scraped data with sentiment labels
- Analysis History: Historical trust scores
- Web scraping depends on site availability and structure
- Some platforms (Twitter) may require API keys for better results
- Sentiment analysis accuracy depends on text quality
- Rate limiting may slow down scraping
If the sentiment model fails to download:
# Pre-download the model
python -c "from transformers import AutoTokenizer, AutoModelForSequenceClassification; AutoTokenizer.from_pretrained('cmarkea/distilcamembert-base-sentiment'); AutoModelForSequenceClassification.from_pretrained('cmarkea/distilcamembert-base-sentiment')"- Check your internet connection
- Some sites may block scrapers - this is normal
- The app will continue with available data
This project is for educational purposes (Hackathon).
Built for the Blackbox Hackathon 2025.
- Backend: Python, asyncio, aiohttp
- UI: Streamlit
- Scraping: BeautifulSoup, Newspaper3k
- AI: HuggingFace Transformers (CamemBERT)
- Database: SQLite, SQLAlchemy
- Visualization: Plotly
Made with ❤️ for the Hackathon Blackbox