Skip to content

prathham-k21/text-summarizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

SummarAIze — NLP Text Summarizer v3

A full-stack extractive text summarization app built with Node.js + Express (backend) and Angular 17 (frontend), featuring TextRank, TF-IDF, and Word Frequency algorithms, plus Named Entity Recognition and Flesch-Kincaid Readability scoring.


Project Structure

text-summarizer/
├── README.md
│
├── backend/
│   ├── package.json
│   └── src/
│       ├── server.js                        # Express app entry point
│       ├── routes/
│       │   └── summarize.routes.js          # POST /api/summarize, /api/summarize/stats
│       ├── controllers/
│       │   └── summarize.controller.js      # Request handlers
│       ├── services/
│       │   └── summarizer.service.js        # TextRank, TF-IDF, Frequency, NER, Readability
│       └── middleware/
│           ├── validateRequest.js           # express-validator middleware
│           └── errorHandler.js             # Global error handler
│
└── frontend/
    ├── angular.json
    ├── tsconfig.json
    ├── tsconfig.app.json
    ├── package.json
    └── src/
        ├── main.ts                          # Bootstrap
        ├── index.html
        ├── styles.scss                      # Global styles
        ├── environments/
        │   └── environment.ts
        └── app/
            ├── app.component.ts
            ├── app.config.ts                # Standalone app config
            ├── app.routes.ts
            ├── models/
            │   └── summarizer.model.ts      # TypeScript interfaces (incl. Entities, Readability)
            ├── services/
            │   └── summarizer.service.ts    # HttpClient API service
            └── components/
                └── summarizer/
                    ├── summarizer.component.ts    # Signals, computed, 5-tab output
                    ├── summarizer.component.html  # Full UI template
                    └── summarizer.component.scss  # Dark terminal styles

Getting Started

1. Backend

cd backend
npm install
npm run dev        # nodemon hot reload
# API runs on http://localhost:3000

2. Frontend

cd frontend
npm install
npm start          # Angular dev server
# App runs on http://localhost:4200

API Reference

POST /api/summarize

Request body:

{
  "text": "Your long text here...",
  "numSentences": 5,
  "method": "textrank"
}
Field Type Required Description
text string Min 100, max 50,000 chars
numSentences number 1–20 (default: 5)
method string textrank, tfidf, or frequency (default: textrank)

Response:

{
  "success": true,
  "data": {
    "summary": "Combined summary text...",
    "summarySentences": ["Sentence 1.", "Sentence 2."],
    "scores": [{ "index": 0, "score": 0.0842 }],
    "stats": {
      "originalWords": 850,
      "originalSentences": 13,
      "summaryWords": 120,
      "summarySentences": 5,
      "compressionRatio": 61.5,
      "readingTimeOriginal": 5,
      "readingTimeSummary": 1
    },
    "keywords": [{ "word": "learning", "score": 4.12 }],
    "entities": {
      "people": ["Steve Jobs"],
      "places": ["California", "United States"],
      "organizations": ["Google", "Microsoft"],
      "numbers": ["2024", "42"]
    },
    "readability": {
      "original": {
        "readingEase": 28.4,
        "easeLabel": "Difficult",
        "gradeLevel": 14.2,
        "avgWordsPerSentence": 22.1,
        "avgSyllablesPerWord": 1.84
      },
      "summary": {
        "readingEase": 31.0,
        "easeLabel": "Difficult",
        "gradeLevel": 13.8,
        "avgWordsPerSentence": 20.4,
        "avgSyllablesPerWord": 1.79
      }
    },
    "method": "textrank"
  }
}

POST /api/summarize/stats

Returns text statistics without summarizing. Includes entities and readability.


How the Algorithms Work

TextRank (default — graph-based)

  1. Each sentence is a node in a graph
  2. Edge weights = cosine similarity between sentence word vectors
  3. Similarity matrix is row-normalized
  4. Power iteration (PageRank, 30 rounds, damping=0.85) converges on sentence importance scores
  5. Top-N sentences selected and reordered to preserve original flow

TF-IDF

  1. Each sentence is treated as a document
  2. TF-IDF score computed per word per sentence
  3. Sentence score = average TF-IDF of its significant words
  4. Rare but meaningful words are weighted higher

Word Frequency

  1. Word frequency map built from entire text
  2. Stop words and numbers removed
  3. Frequencies normalized against the max
  4. Sentences scored by average normalized frequency of their words
  5. Best for news articles with repeated key terms

Named Entity Recognition (NER)

Uses the compromise NLP library to extract:

Entity Type Example
People Steve Jobs, Elon Musk
Places California, United States, London
Organizations Google, Microsoft, OpenAI
Numbers & Values 2024, 42 billion, 15th

Readability — Flesch-Kincaid

Two scores computed for both the original text and the summary:

Reading Ease (0–100, higher = easier)

206.835 − (1.015 × avg words/sentence) − (84.6 × avg syllables/word)
Score Label
90–100 Very Easy
70–89 Easy
60–69 Standard
50–59 Fairly Difficult
30–49 Difficult
0–29 Very Difficult

Grade Level (US school grade)

(0.39 × avg words/sentence) + (11.8 × avg syllables/word) − 15.59

Output Tabs

Tab Content
Summary Extracted sentences numbered in order, copy button
Keywords Top 10 TF-IDF keywords with score bar chart
Entities Color-coded tags — People (purple), Places (green), Orgs (yellow), Numbers (blue)
Readability Flesch-Kincaid ease meter + grade level for original vs summary
Stats Word counts, sentence counts, compression ratio, reading time

Key Libraries

Library Purpose
natural Tokenization, TF-IDF, sentence splitting
compromise Named entity recognition (people, places, orgs)
stopword Remove common stop words
express REST API server
express-validator Input validation
helmet HTTP security headers
express-rate-limit 100 req / 15 min rate limiting
@angular/common/http HttpClient for API calls
@angular/animations Fade-in & list stagger animations

Angular Features Used

  • Standalone components (Angular 17, no NgModule)
  • Signals (signal(), computed()) for reactive state
  • HttpClient with typed Observable responses
  • Animations (@fadeIn, @listStagger)
  • FormsModule with ngModel two-way binding
  • RxJS catchError, map for error handling

About

NLP Text Summarizer — TextRank, TF-IDF, NER, Readability · Node.js + Angular

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors