Skip to content

Kinda failures #16

@everythingability

Description

@everythingability

Hello, I'm parsing an academic book (with Flair NER) that contains "dates" such as:

  • more than 500 years ago
  • the fourteenth century
  • the Middle Ages
  • the sixth century
  • the year 105 CE
  • 1962
  • 1967
  • 1952
  • the eighteenth century
  • 1887
  • 1993
  • 1925
  • the century
  • today
  • 1952
  • 1979
  • around 5000 years old
  • the thirty-second century
  • a thousand or more years
  • the first century
  • the sixth century BCE
  • 1996
  • 1998
  • 1999
  • 2017
  • 1542
  • 6th century
  • the 2nd century
  • the 5th century BCE
  • the 7th century CE
  • the last decade
  • 1473
  • 1500
  • 1599
  • the 14th century
  • 1894
  • 1989
  • 1990
  • 9 April 1945
  • July
  • March 1944
  • May 1933
  • the first months
  • Today
  • fifteen hundred years ago
  • 1925
  • at least 3000 years
  • the 3rd century BCE in Qing Dynasty
  • 5000 years ago
  • seventy years
  • 2015
  • the 21st century
  • 1814
  • 2000 BCE
  • 1815
  • Christmas Eve, 1851
  • today
  • 1998
  • 2014
  • 2025
  • those days
  • 1983
  • 1991
  • 1996
  • twenty years
  • 1555
  • the twelfth century
  • today
  • 1998
  • 2017
  • twenty years from 1998
  • 1969
  • 1974
  • 1998
  • 2001
  • April 2002
  • every year
  • monthly
  • 1991
  • 2002
  • 2003
  • 2004
  • 2014
  • four years
  • 1269
  • a hundred years
  • seventh century
  • the 9th century
  • the Yuan dynasty
  • 2015
  • today

and ideally want a "close enough for jazz" year (or not) from a parse... but many of the dates like these fail... even ones that seem pretty easy to get like "2000 BCE".

I even tried...

theDate = theDate.replace("BCE", "BC")
theDate = theDate.replace("th century", "00")
theDate = theDate.replace("rd century", "00")
theDate = theDate.replace("st century", "00")
theDate = theDate.replace("nd century", "00")

..to help it out a bit... but it also fails on items like "the fourteenth century" .

Thanks for the work though. I found a python 2.7 module called dataparse that might be worth a peek? Couldn't install it through.

Tom

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions