Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trouble downloading data #10

Closed
jtunnell opened this issue Nov 2, 2018 · 33 comments
Closed

Trouble downloading data #10

jtunnell opened this issue Nov 2, 2018 · 33 comments

Comments

@jtunnell
Copy link

jtunnell commented Nov 2, 2018

After making an account to download the data I keep getting an error. On a Linux machine I get the error this xml file does not appear to have any style information associated with it on ubuntu 16.04 and the file will not download. On a windows machine I can download the data, but when I extract it using WinRAR I get just the name of the tar file with no file type extension. Reading your documentation, I should expect it to extract into a folder, but that doesn’t seem to be the case. Let me know what I need to do to fix the issue. Thank you.

@holger-motional
Copy link
Contributor

Hi Jordan. Thanks for your interest in nuScenes. We apologize for these problems.

  1. Regarding the XML error, this seems to be a rare communication issue between our hoster and AWS. See this for more details. We are working on implementing that fix. Meanwhile it's good you tried another machine.

  2. I can confirm your issue with WinRAR not recognizing the file format properly. A temporary workaround is to rename the .tar files to .tbz2. That works for me. We will rename these files on our end soon.

Let me know if you have any other questions.

@jtunnell
Copy link
Author

jtunnell commented Nov 4, 2018

I tried your fix for windows by renaming the file extension. It worked for me! Thank you. I am excited to work with this data set.

@jtunnell jtunnell closed this as completed Nov 4, 2018
@jtunnell jtunnell reopened this Nov 4, 2018
@jtunnell
Copy link
Author

jtunnell commented Nov 4, 2018

I am now getting the error on a windows machine using winrar data\nuscenes\nuscenes_teaser_pointclouds_samples_v1.tbz2: The archive is corrupt and data\nuscenes\nuscenes_teaser_pointclouds_sweeps_v1.tbz2: The archive is corrupt
. This is happening to all of the folders except the metadata file. Most of them error out before half of the data is extracted. When I try to download the data again to see if it was corrupted during download I get the same This XML file does not appear to have any style information associated with it error again. I know you are working on this issue, but I think the first issue may be related in some way.

@Qiang-Xu
Copy link
Contributor

Qiang-Xu commented Nov 5, 2018

I am now getting the error on a windows machine using winrar data\nuscenes\nuscenes_teaser_pointclouds_samples_v1.tbz2: The archive is corrupt and data\nuscenes\nuscenes_teaser_pointclouds_sweeps_v1.tbz2: The archive is corrupt
. This is happening to all of the folders except the metadata file. Most of them error out before half of the data is extracted. When I try to download the data again to see if it was corrupted during download I get the same This XML file does not appear to have any style information associated with it error again. I know you are working on this issue, but I think the first issue may be related in some way.

Hello jtunnell, you need to refresh the page to download these files again. The links will usually expire in 10 minutes or so, refresh the page to get new links, thanks!

@jtunnell
Copy link
Author

jtunnell commented Nov 5, 2018

I have downloaded the data again today where I made sure that I hard reloaded the links. I don't experience the XML file error but I am still getting the error data\nuscenes\nuscenes_teaser_pointclouds_samples_v1.tbz2: The archive is corrupt. This is happening for all of the folders except for the meta. If you say that the links expire in 10 minutes and it takes much longer to download then 10 minutes would that be causing the issue here? I don't think it would be, but it might be. It seems that the download isn't working completely as I am seemingly only able to extract only a portion of the data successfully. Thank you for your help.

@Qiang-Xu
Copy link
Contributor

Qiang-Xu commented Nov 6, 2018

I mean once you start downloading, you should be able to download the whole file. It can be that the download was not completed, usually it is caused by network issue. Do you use browser to download or something else? @jtunnell

@Qiang-Xu
Copy link
Contributor

Qiang-Xu commented Nov 6, 2018

Maybe you can try wget, for example:

# wget -c -O filename "url", for example:
wget -c -O nuscenes_teaser_images_samples_v1.tbz2 "copy_and_paste_your_url_here"

-c means continue, so if your connection is broken, you can refresh the page and get another url and continue your downloading.

@jtunnell
Copy link
Author

jtunnell commented Nov 6, 2018

I will give wget a try. In chrome, I get the indication that I get the entire download for every file. However, it always breaks when I extract it. I will try this on a few machines tomorrow.

@jtunnell
Copy link
Author

jtunnell commented Nov 6, 2018

I used the wget method to download the lidar and radar point clouds samples and I still got the same issue. Is it possible that something is wrong with the file currently? Also, a question about the radar what raw radar information do we get. Can we read the files our selves or do we have to use your code to work with the data?

@Qiang-Xu
Copy link
Contributor

Qiang-Xu commented Nov 7, 2018

I mean you need to make sure that you download the whole file.
image

For example, from wget output, you can see how big the file is and what the progress is, I suggest you check file size on your local disk. The whole point of using wget is you can continue downloading the file from where it was left.(But be careful with the command, use the corresponding url for the file)

@holger-nutonomy I suggest we show file length and md5sum or sha1sum on download page.

@holger-motional
Copy link
Contributor

@Qiang-Xu Yes, let's show md5sums.

@jtunnell
The radar data comes in the standardized Point Cloud Data format file. But the best is to use our code to read it.

@jtunnell
Copy link
Author

jtunnell commented Nov 8, 2018

I have used wget on a few computers and I get the full download. I have double checked that I have used the correct commands. I don't think this is an issue of getting the whole file. I have also tried downloading from your webpage again. Every time I extract the data I get the issue data\nuscenes\nuscenes_teaser_pointclouds_samples_v1.tbz2: The archive is corrupt. It seems to always happen here for this file

image

@Qiang-Xu
Copy link
Contributor

Qiang-Xu commented Nov 9, 2018

Hey @jtunnell, sorry for the delay, I'm on a business trip this week. I wonder if you can create an account for me on any of your linux machines that I can ssh in or you can spin up a VM somewhere. I'd like to test it in your environment, because on my end it is okay. Let me know your thoughts.

@holger-motional
Copy link
Contributor

@jtunnell I'm closing this issue for now, as we cannot reproduce it. Please let us know if you continue to have problems.

@shayanshir
Copy link

I have the same "corrupted archive" issue in Windows.

@ChFernandez12
Copy link

ChFernandez12 commented Jun 4, 2019

@holger-motional
Copy link
Contributor

@ChFernandez12 Can you try again with a new key? The key expires after a while if you don't start (!) the download.

@ChFernandez12
Copy link

Yes I have try it several times with different keys from my ssh server. Any thoughts?

@holger-motional
Copy link
Contributor

holger-motional commented Jun 4, 2019

@ChFernandez12 I've just tried this and it seems that the double quotes around the URL are needed. So please do
wget -c -O v1.0-trainval02_blobs.tgz "https://s3.amazonaws.com/data.nuscenes.org/public/v1.0/v1.0-trainval02_blobs.tgz?AWSAccessKeyId=AKIA6RIK4RRMFUKM7AM2&Signature=XXXXXXXXXXXXXXXXXXXXXXX&Expires=1560083334"

@ChFernandez12
Copy link

It worked! Thank you!

@linchunmian
Copy link

After making an account to download the data, I found that the button is grey and I cannot download the data. How should I do to get the nuScene and nuImage data sucessfully? Please help me!
Screenshot from 2021-05-26 22-04-56

@ChFernandez12
Copy link

ChFernandez12 commented May 26, 2021 via email

@linchunmian
Copy link

Thanks.
If I want to conduct 3D LiDAR-image object detection experiment, which data need I download? Or say, I am confused about the difference between nuScenes-lidarseg and Full dataset(v1.0). I do not know which one is the LiDAR data corresponding image in nuImage subset.

@holger-motional
Copy link
Contributor

nuImage is a separate dataset that is 2d only.
nuScenes is the dataset with 3d object annotations.
nuScenes-lidarseg is an extension for nuScenes with lidar point-level labels.

@linchunmian
Copy link

Thanks. So, you mean if I want to test my 3d detector performance, 'full dataset v1.0' is suitable for me?
Screenshot from 2021-05-27 15-16-19

@linchunmian
Copy link

I do not find 'nuScenes' name in the official downloading website.

@holger-motional
Copy link
Contributor

Exactly. Mini is good if you just want to run it on a small subset (10 scenes).

@linchunmian
Copy link

Thanks for your kindness, and there are still two problems confused me as follows:

  1. If I want to submit the test result to the evaluation server, should I download the whole trainval and test data under 'the full dataset v1.0'?
  2. If I have download the whole trainval and test data and want to construct the following file structure, how should I do?
    Each samples and sweeps directories should be extracted from the subset, e.g. v1.0-trainval01_blobs, v1.0_trainval02_blobs, and then move to a unified 'samples' or 'sweeps' directory. Does it?
    Screenshot from 2021-05-27 17-18-05

Thanks again for any instructions.

@holger-motional
Copy link
Contributor

  1. If you already have a model and you just want to run it, you can use test only.
  2. Yes, depending on which "zip" program you use, you need to merge all of the folders of the different archives.

@linchunmian
Copy link

  1. If you already have a model and you just want to run it, you can use test only.
  2. Yes, depending on which "zip" program you use, you need to merge all of the folders of the different archives.

Thanks.

  1. I mean if I want to train a model and submit its test result to the evaluation server, need I download the whole trainval data to train my model? Or say, if I only use mimi data for training, does the detection performance damage? Maybe I think it does, but do not determine.

@holger-motional
Copy link
Contributor

Yes, 10 scenes are not a lot (200s of data). Hence the mini split performance will be pretty bad.
You will need to train on the entire train (or trainval) set.

@holestine
Copy link

Hello,

When I click on the US link to download the dataset I get the following console message when I inspect in Chrome instead. Any idea why I can't download? thanks

Access to XMLHttpRequest at 'https://o9k5xn5546.execute-api.us-east-1.amazonaws.com/v1/archives/nuplan-v1.0/nuplan-maps-v1.0.zip?region=us&project=nuScenes' from origin 'https://www.nuscenes.org' has been blocked by CORS policy: No 'Access-Control-Allow-Origin' header is present on the requested resource.
instrument.ts:129 err Error: Network Error
at e.exports (createError.js:16:15)
at h.onerror (xhr.js:99:14)
at XMLHttpRequest.r (helpers.ts:88:17)
trycatch.ts:281 GET https://o9k5xn5546.execute-api.us-east-1.amazonaws.com/v1/archives/nuplan-v1.0/nuplan-maps-v1.0.zip?region=us&project=nuScenes net::ERR_FAILED 404

@whyekit-motional
Copy link
Collaborator

@holestine if this is an issue related to the nuPlan dataset, pls post it at https://github.com/motional/nuplan-devkit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

8 participants