Skip to content
This repository was archived by the owner on Nov 16, 2023. It is now read-only.

Supporting pathlib's Path objects in FileDataStream#377

Open
pnshinde wants to merge 2 commits into
microsoft:masterfrom
pnshinde:path-support
Open

Supporting pathlib's Path objects in FileDataStream#377
pnshinde wants to merge 2 commits into
microsoft:masterfrom
pnshinde:path-support

Conversation

@pnshinde

Copy link
Copy Markdown

Fixes #269 . pathlib's Path objects can be converted to strings just by casting, and vice versa. I added a check in FileDataStream's init function to convert a Path object to a string. I also wrote a test for this, but calling FileDataStream.read_csv() on a Path object produces the following error (using tool=None/'pandas'):

Screen Shot 2019-11-28 at 7 34 09 PM

@ganik do I need to define my own schema for a file path as a Path object, even though the contents of the file should be the same? I'm a little stuck here and any help would be appreciated!

@msftclas

msftclas commented Nov 29, 2019

Copy link
Copy Markdown

CLA assistant check
All CLA requirements met.

@pieths

pieths commented Dec 2, 2019

Copy link
Copy Markdown
Collaborator

Will this work with Python 2.7? It looks like pathlib was added in version 3.4.

with tempfile.NamedTemporaryFile(mode='w', delete=False) as f:
df.to_csv(f, sep=',', index=False)

fi = FileDataStream.read_csv(Path(f.name), sep=',')

@pieths pieths Dec 2, 2019

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unit test is failing here. Looks like a similar change might also be required in internal\utils\data_schema.py (in read_schema(*data, **options)).

I haven't tested it but maybe updating the part of the code around line 844 might fix the issue:

elif hasattr(X, 'read') or isinstance(X, str) or (
                    six.PY2 and isinstance(X, (str, text_type))):

@pnshinde

pnshinde commented Dec 8, 2019

Copy link
Copy Markdown
Author

It looks like you're right, and that Path objects won't work with python 2.7. I tried several different things to try and get it to work but with no luck. Do you have any suggestions on how to proceed? What if we checked the system python version and used Path objects only when that value is greater than 3.4?

@pieths

pieths commented Dec 12, 2019

Copy link
Copy Markdown
Collaborator

It looks like you're right, and that Path objects won't work with python 2.7. I tried several different things to try and get it to work but with no luck. Do you have any suggestions on how to proceed? What if we checked the system python version and used Path objects only when that value is greater than 3.4?

Checking for a particular Python version is done in some other parts of the code (including the snippet in my previous comment). Search for six.PY2 in the code base for more examples.

@ianlini

ianlini commented Jan 10, 2020

Copy link
Copy Markdown
  1. For Python < 3.4, you can use pathlib2, but I don't think that is the best practice. BTW, I think Path.resolve() should be called by the user, but not by the library.

  2. According to PEP 519, you can implement a function like os.fspath (added in Python 3.6) by path.__fspath__() if hasattr(path, "__fspath__") else path. The definition of path-like object is actually:

    A path-like object is either a str or bytes object representing a path, or an object implementing the os.PathLike protocol.

    so we shouldn't use pathlib.Path to determine whether an object is path-like.

  3. I'm not sure what the best practice is for testing. Maybe you can go with pathlib2, or implement a class with __fspath__.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support pathlib in FileDataStream

4 participants