카지노싸이트 Librarians Group discussion
note: This topic has been closed to new comments.
[Closed] Added Books/Editions
>
Large Book Data Import

So if for example an author renames their book the old original book will be losing the ASIN and the new imported book will now show the ASIN?
Will the old book have any easily identifiable notes that the ASIN has been removed by this process?

Not a good idea. As Bookworm mentions, we have plenty of authors on 카지노싸이트 with a book-renaming fetish not to mention with a habit of swapping ASINs etc between editions so that the newest cover has the ASIN. I can see a lot of duplicates being created and the same authors will most probably mess with the new book in the same way as they messed wth the old one. I would suggest that if you do decide to swap books, because that is what this is, that the 'new' one should have a librarian note and a log entry added (not just a special queue log) in the same way as we add one to an alternate edition, because that is also what this is, pointing to the original book record.
In the future then, the alternate edition will be the one that has the ASIN and not the earliest edition of a work. Really poor idea, unless I am reading it wrong.

I sincerely hope the librarian manual and policies get updated to reflect that if true. Seriously, that one's worth a mass email or pm to all librarians and goodreads authors (if not an outright sitewide announcement)
I wish amazon cared half as much about the goodreads database and member book catalogs than they do about what is currently for sale.

Another issue as described. When a book is published under a pseudonym all later editions must have the same pseudonym as the primary author for combining etc. If the book is later published under the authors real name (or a new psudonym) it won't match the author/asin on import so will become a new book record but WILL take the asin from the original version of the book. Geez. The audit trail will be entertaining.

Naturally I deleted it.

I've been noticing that too. Reversed a few of those in the last couple days.

This book was added by the amazon_kcw importer on 30 Nov
/book/show/1...
/book/edits/...
No edits since then until the amazon_sable importer decided to remove the ASIN and give it to a newly imported edition
This one. Added a few minutes ago by the amazon_sable importer
/book/show/2...
has a different spelling of the author (with illegal extra bits)
This is the entry in the big edits log (my bold):
amazon_sable updated the book The Bull, the Bear and the Planets : Trading the Financial Markets Using Astrology by Bucholtz B.Sc. MBA, M. G.
asin: 'B00CLMKGAE' to '' (undo)
(flag)
0 minutes ago (#61760687)
The edition that is pointing to is the first one I listed above which does show the ASIN being removed but the newly imported edition that has been given that ASIN does not have any indication that the ASIN has been moved. The log on that book doesn't show the ASIN at all. No audit trail. Books that have been added to shelves via an ASIN now don't have one.
/book/edits/...
The four single editions are all the results of recent amazon imports.
/search?utf8...+
I notice that the importer is doing nearly 10,000 edits every 10 minutes.

/book/show/1...
/book/show/1...
/book/show/1...

/book/show/2...
Title includes published by info
Diagnostic Pathology: Familial Cancer Syndromes: Published by Amirsys
Author names with the MD, PHD etc labels.
Dr. Vania Nose M.D. Ph.D, Joel K. Greenson Md, Dr. Gladell P. Paner Md.
Published name in "'s (name here different from what published by in title says)
"Lippincott Williams & Wilkins"
Oddly enough says published in 2013 but has no ISBN.

Has the import created duplicates for any book where Amazon's record does not match 카지노싸이트' author naming conventions? e.g. Dr, Sir, etc. plus any variations in spacing/punctuation between initials and any potential spelling errors?
For example, a quick search (actually the "add book/author" tool seems to be more effective than the actual site search) turns up amazon_sable imports for "Sir Arthur Conan Doyle", and how about "Sir Arthur Conan Doyle Sir"?
At least some of these are Kindle editions:
Sarah wrote: "if a Kindle Edition's ASIN matches a book in the GR catalog but does *not* match on title or author, the ASIN will be removed from the GR record and attributed to the new book."
That appears to mean that, since Amazon is always "correct" for Kindle books, that the new duplicate authors and books (Sirs, Drs, etc.) have "stolen" the ASINs from the existing, correct records (as per 카지노싸이트 naming conventions, i.e. no Sirs). Is that right?
It's too far past my bedtime for me to investigate further but it looks like there will be plenty of fixing to keep everyone entertained there...

Thanks for keeping us updated on how this is going. We're definitely aware of some of these issues and we plan on running a cleanup to bring the Amazon imported data up to 카지노싸이트 standards and best practices. We've tried to include some cleanup to remove titles like Dr. PhD etc from names, but because the formatting of the data we're receiving is not standardized, our first pass at cleaning the data is definitely not fool-proof.
If you see patterns (like some of those mentioned already), definitely report them. That will make them easier to clean up programatically.
Also, just to clarify - aside from ASINs user-imported data always takes precedent over feeds, including the one coming from Amazon right now.
It's been a busy weekend, so I haven't had a chance to look through all your comments, yet. I'll check the thread again first thing tomorrow when I'm in the office and see if I can answer any questions or track down any issues.
Thanks for your patience with this giant and imperfect influx of data.

I haven't understood very well what you say here:
ASSIGNMENT OF ISBN/ISBN13 AND ASINS:
For non kindle editions, new books will never "steal" an isbn or isbn13. Because Amazon is considered the authority on ASINs, Kindle Editions are handled somewhat differently. During the import, if a Kindle Edition's ASIN matches a book in the GR catalog but does *not* match on title or author, the ASIN will be removed from the GR record and attributed to the new book.
The problem I'm finding is that, when the isbn of the book is not already present on GR, the Kindle editions show the isbn, and not the ASIN. Until someone makes an edit, and the ISBN "magically" turns into the ASIN, I think that users that do a search for the ISBN wishing to shelve the non Kindle edition, find the Kindle (this is what I mean by "stealing" the ISBN). Also, there is the "false" notion that the ISBN is present in GR, while it only takes one edit for it to disappear, replaced by the ASIN (as I said before).
Sorry, my explanation is confused, I hope you got it.
Your statement means that, with this latest import, this won't happen again?

Now that I know 카지노싸이트 plans to run some correction scripts, I will hold off for a week or two on my edits.

What was your chosen error?

Unfortunately, if an author renamed their book on GR between it getting entered into the Amazon catalog and this import, it's likely that the Amazon record will have taken the ASIN.
The ASIN removal on the original GR book is logged in the librarian edit logs as being performed by amazon_sable. If you see any ASIN 'thefts' that should not have occurred, definitely feel free to remove the new book and put the ASIN back on the original record.
Once you've made the correction, the ASIN should remain stable.
Also, we have a log of ALL conflicts that occurred which resulted in a new book being created - We'll hopefully make that queue available to librarians soon in the new year. It will include links to both the old and new book, as well as a summary of the conflict and the changes made.

Not..."
We've tried to minimize the duplication of books - and when we did an initial analysis it looked like there would be a minimal number of new books created that would remove the ASIN from an existing record.
The vast majority of duplicates are going to have arisen because of Kindle Editions that don't have an ASIN entered. This is something we definitely hope to clean up programatically
ETA: The changes should be reflected in the librarian log. In the future would it be helpful to have an actual note attached?

Author pseudonyms are something we definitely struggled with for this import. We made our best effort to match not only to an author's primary name but to their pseudonyms as well. We also checked secondary authors when deciding whether something was a match. A match to ANY author name listed on the book was considered a positive match, so hopefully that minimized the author and book duplication.
We also tried to make the matching a little more flexible so that if the import had initials instead of the author's full name (or vice versa), it would still match the author in the GR database.
Author duplication is also something we can clean up programmatically once we have a good sense of what the matching issues were.
ETA: We *also* think that our catalog is pretty freakin' good :-) So we also made our best effort to never never overwrite librarian entered data. That's still our policy and will likely always be our policy - mostly because you guys are awesome.

Ah cover images. We had a few of Amazon's generic 'no cover' images blacklisted so they wouldn't be imported, but since images are often uploaded by merchants, we couldn't screen for all of them.
If you see more of these, can you let me know their GR book IDs/ASINs? We might be able to expand the blacklist to get rid of useless non covers.

/book/show/1...
/boo..."
Thanks for the heads up. I'll put that on our data cleanup list.

/book/show/2...
Title includes published by info
Diagnostic Pathology: Familial Cancer Syndrom..."
The way the import worked, if an ISBN existed in the GR system but the title or author didn't match, a new book WITHOUT the isbn was created. That may be why the isbn is empty. The log we have of all these conflicts should help us sort through some of this.

D'oh. We tried to account for titles like Dr. And Sir, but apparently we missed a few patterns. We can combine duplicate authors and their works in an automated way a little down the line.

I haven't understood very well what you say here:
ASSIGNMENT OF ISBN/ISBN13 AND ASINS:
For non kindle editions, new books will never "steal" an isbn or isbn13. Because Amazon is consid..."
Moloch - can you give me an example? I'm not sure I follow. Not sure if this is the same vein as what you're describing, but Amazon's ASIN system works like this:
For print books and books with an isbn10, the ASIN *is* the isbn10. For Kindle editions or books without isbns, amazon assigns an ASIN (usually starting with a B).
GR policy for ASINs: We are not currently tracking ASINs for any books other than Kindle Editions. Wherever possible, we want the isbn and isbn13 for print editions.

What was the error you were seeing? We can hopefully update our code to handle that particular pattern and fix the error for you in an automated way :-)

/book/show/1...
/book/show/1...
Go to the book page and you'll see the ISBN13
Go to the edit page and you'll see the ASIN
We reported other cases (I think they have been fixed by now) here: /topic/show/...

/book/show/1...
/book/show/1...
Go to the book page and you'll see the ISBN13
Go to the edit page and you'll see the ASIN
We r..."
Ah, interesting. That sort of record (an isbn13 on a book labelled as a Kindle Edition) probably resulted in a new record being created for the physical edition, but without applying the isbn13. In cases where the Kindle Edition doesn't have an isbn10, that may be on the new record.
I think the ultimate goal would be to have the isbn13 applied to the appropriate physical edition so that the ASIN shows on the book page and so that the two editions can be be distinguished.

..."
There is no physical edition that goes to the isbn13 that is displayed from what I understand. That ISBN13 goes to the Kindle Edition, but Amazon and GR use the ASIN to display the info. However someone could search by the ISBN on Amazon and it will pull up the Kindle edition, the Amazon pages shows the ASIN not the ISBN. It is just a number to confuse things.

It was "Ph D" restricted to Author. The list is now up to 9,569 names and 479 pages. You'll see if you go to one of the final pages that the import has added dozens of degree initialisms to some names.

See message #10

/book/show/1...
/book/show/1...
Go to the book page and you'll see the ISBN13
Go to the edit page and you'll see..."
They are the ISBNs for the ebook (non Kindle) editions, not the physical editions, like Bookworm says

Don't forget that many title or author edits are done because the title (or author) on Amazon is wrong. The edits we make are to fix duff data and creating a duplicate record with a stolen ASIN is going to be very hard to find without some very detailed logs.
Something like, "ASIN: xxxxxxx was taken from: book-A, given to: book-B, reason: yyyyyyy" might be OK to start.
How many million edits were there during the run?

... Something like, "ASIN: xxxxxxx was taken from: book-A, given to: book-B, reason: yyyyyyy" might be OK to start."
That's precisely the sort of log we hope to get up and running for you all not too long from now. We have all that information ready to go, we just need to create an interface to work with.
As for stats, I don't have them readily available, but I will have a better idea soon. I'll also potentially be able to get you some sort of breakdown for Kindle Editions vs. physical books.

This one is quite bad:
/book/show/1...
Imported by "amazon_kcw"

/author/show...
/author/list...
The above are examples of no ID imports from Amazon Sable on Dec. 21, 2013. There are many.
They all appear to be no info Dups and have:
No IDs.
A publisher of "Capstone Press" (in quotes)
Multiple single or partial names listed as authors (Charles, Jason, Tod, Smith, Burgan, Erwin, etc, etc, etc).

the ISBN is not valid"
And edition of that book on Amazon has an ISBN-10: 0764210181 and also an ASIN: B00B1KKEVI on the same edition. Is that something that happens a lot?
on 카지노싸이트 it ended up just using the ISBN and the ASIN has not appeared on GR.
/book/show/1...
amazon_sable isn't involved with that book but it does show that Amazon need more librarians to get their data fixed before we can rely on it.

What about audiobooks? "Print" audiobooks -- the ones on CD or cassette -- and some e-audiobooks have ISBNs, but audiobooks published by Audible are the equivalent of ebooks published on Kindle: they only have ASINs as their id. We've been using ASINs on those audiobooks in GR.

Edit: I could not even find the ASINS I tried with Google.

That is brilliant. Looks like amazon_sable strikes again.
This is page 104 of those editions.
/work/editio...

All of the books (hardcover & paperbacks) for the above have no IDs at all & most have actual author name(s) in the title.

Not cool, sorry! Fortunately these duplicates appear to have some traits in common that we can use to identify them. I'll need to do some more research to be sure, and then we can start working on an automated script to remove them.

All of the books (hardcover & paperbacks) for the above have no IDs at all & most have actual author name(s) in the title."
Thanks for catching this! I'll add it to the automated script for removal.

This one is quite bad:
/book/show/1...
Imported by "amazon_kcw""
Yup, you can report errors here. The amazon_kcw source is actually a different source, but it shares much of the same code as the import we're talking about on this thread (which are reported as amazon_sable). Fixing errors for one source will often fix them for the other source as well.

/author/show...
/author/list...
The above are..."
Very strange, I'm not sure what happened with these Capstone Press books. Have you seen any books without an ID by a different publisher?
I believe we can re-import the Capstone Press books as a way of automatically fixing the ID and author name issues (after fixing the code of course).

/author/show...
/author/list......"
Yes, there have been books without IDs from other publishers.
And why re-import? They are all duplicates of existing records, as far as I can determine. They should probably all be deleted.

/author/show...
/author/list......"
The re-import would actually delete or merge the books as necessary.

This book: The Tenant of Wildfell Hall
has had the ASIN removed and now is associated with The Tenant of Wildfell Hall.
The NEW book is not right, it has even created a new author that isn't real.
**I haven't changed so tech can look at because the other book just needs to be deleted.

I was just adding a book to my shelf The Three Musketeers. When I noticed the authors are incorrect, it isn't combined with the other editions and I suspect it stole the ASIN from an existing edition, but I can't easily tell. It also is one of those that shows the ISBN-13 until you go to edit, which then the ASIN appears.
This topic has been frozen by the moderator. No new comments can be posted.
Books mentioned in this topic
Snobs (other topics)The Twelve Dates of Christmas: Dates 1 and 2 (other topics)
The Twelve Dates of Christmas: Dates 1 and 2 (other topics)
The Twelve Dates of Christmas: Dates 1 and 2 (other topics)
Divisadero (other topics)
More...
Authors mentioned in this topic
Unknown (other topics)Various (other topics)
Unknown (other topics)
Unknown (other topics)
Avery T. Willis Jr. (other topics)
More...
This evening we're going to kick off an import of Amazon book data into the 카지노싸이트 catalog. This is similar to the one-off imports that have been happening for awhile now, but will be much larger in scale. I wanted to make sure some details of the import were clear and make sure there was an obvious place (this thread) to ask questions as they arise.
HOW BOOKS ARE MATCHED:
As books are imported, our system attempts to match them to pre-existing books by numerical identifiers (isbn, isbn13, or asin), title, and authors. When we aren't able to find a good match, a new book will be created and the system will attempt to assign it to the appropriate work. There will be cases in which some of the data between the two catalogs matches (isbn13 for instance) but other data does not (such as title or author). In these cases a new book will be created, but we will also generate a log of the conflict so that we can address the mismatch later. We hope to make this queue of book conflicts available to librarians sometime in the new year to simplify identifying import errors.
ASSIGNMENT OF ISBN/ISBN13 AND ASINS:
For non kindle editions, new books will never "steal" an isbn or isbn13. Because Amazon is considered the authority on ASINs, Kindle Editions are handled somewhat differently. During the import, if a Kindle Edition's ASIN matches a book in the GR catalog but does *not* match on title or author, the ASIN will be removed from the GR record and attributed to the new book.
CORRECTING BAD IMPORTS:
In the new year we are planning to run a follow-up script on all imported books in an attempt to clean up some of the data and merge duplicate books/works that were generated during the import.
You're welcome to correct any errors you see and merge any duplicate books or works - but we wanted to let you know that we'll making an attempt to make fixes in an automated fashion. If you don't feel like scouring these imports, we should have a more consolidated list of corrections needed sometime soon :-)
Thanks so much for all your contributions now and in the past! Let me know if you have any questions.