Perhaps you should read, or if you have read, really internalize the "Falsehoods Programmers Believe About Names" link referenced above. Here's some choice excerpts:
5. People have exactly N names, for any value of N.
11. People’s names are all mapped in Unicode code points.
18. People’s names have an order to them. Picking any ordering scheme will automatically result in consistent ordering among all systems, as long as both use the same ordering scheme for the same name.
20. People have last names, family names, or anything else which is shared by folks recognized as their relatives.
32. People’s names are assigned at birth.
33. OK, maybe not at birth, but at least pretty close to birth.
34. Alright, alright, within a year or so of birth.
35. Five years?
36. You’re kidding me, right?
40. People have names.
What this really means is, depending on the application and how important the name is to you, you will have different requirements. You need to fit your schema to your needs. For some applications, the name may be important, and extra care might need be taken to reflect the user's entry as accurately as possible. For others, it may simply be a string you expect your payment processor to accept, and that's all you care about.
I already addressed many of these by just advising to have arbitary-length strings (and maybe just a single one). That takes care of #5, #11, and #20. I don't see how #32 - 36 are even relevant: your system should just take whatever name they give you. How long they've had the name is irrelevant (as long as you have a mechanism to change it).
#40: Sorry, but how can you not have a name? Good luck getting an ID document without a name. That doesn't make sense.
#11: This just sounds dumb: Unicode has thousands upon thousands of characters. Only with Chinese can I see this possibly being a problem, but even here I thought Unicode did account for that. Unicode was designed to have every conceivable glyph that every language on Earth uses.
> Sorry, but how can you not have a name? Good luck getting an ID document without a name. That doesn't make sense.
That's what 32-36 are referring to, cases where children aren't given a name for an extended period of time. Even without extreme cases. what about a Hospital system meant to be used to track newborn children? It's necessarily not going to be able to rely on a name, because it's not uncommon for children not to be named for a few days. A system such as that might want to track the children by the mother, date of birth, and birth ordering (in case of twins, triplets, etc).
> This just sounds dumb: Unicode has thousands upon thousands of characters. Only with Chinese can I see this possibly being a problem, but even here I thought Unicode did account for that. Unicode was designed to have every conceivable glyph that every language on Earth uses.
Just because it was designed to accommodate every conceivable symbol, doesn't mean it currently has every conceivable symbol. What about the symbol Prince used for his name[1]? What about a language that's not in Unicode yet?
> There really isn't that much to it: you need to have two strings (and one can be null as I said above)
If someone has a single name, does it go in the first name, or second name spot? Is this enforced, and where?
It's not that you won't generally be served well by your suggestions, it's just that the problem space is complex enough that it's worth thinking about a little before falling back on that. It might be that you don't even need a name, or only need it for billing, and your requirements may change depending on what payment processors expect (is it okay to just save a single string for whatever full name they submit?) These are all worth giving a little thought to up front, because changing the schema after it's in use is always harder, and that goes both ways, Splitting a string into multiple name parts is hard for many of the same reasons listed here.
You don't need to support that. You don't need to cover every single edge case, and that certainly includes people who make up some crazy symbol to be their "name". I'm quite sure Prince didn't file his taxes to the IRS using that symbol, in fact, according to Wikipedia, "1993 also marked the year in which Prince changed his stage name to Prince logo.svg, which was explained as a combination of the symbols for male () and female ()." So that wasn't even his name at all! That was just a stage name. If you're making a system for anything official at all, then you don't have to worry about silly stuff like that.
(On the other hand, if you're making a system for artists which has a field for stage name, then you're going to have to figure out a solution to this one. As you can see with Wikipedia, their solution was to use a SVG file of the symbol and insert that into the text as necessary.)
>What about a language that's not in Unicode yet?
This again probably depends on exactly what your application is. If it's a US government database, then you don't need to worry about stuff like that, because they're not going to care about people using some obscure language that isn't covered by Unicode. Is there even such a language? I doubt it, not any that are in actual use.
Don't forget, just about every language has been Latinized now, so you can always fall back to Latinized characters.
>If someone has a single name, does it go in the first name, or second name spot? Is this enforced, and where?
Irrelevant. Do it however you want, it really doesn't matter.
Now again, as I mentioned before, a lot of this depends on exactly what your application is, so it's pretty hard to come up with any ideas or rules without knowing this. Are you making a database for US government use? Or something for shippers to ship packages around the world? And from what country? Something for payment processors? Based out of what country? Something for the Chinese government? The requirements will change depending on the answers.
If the government forms require a first name and a last name, for instance, and that's the law, and you only have one name, then you have to get yourself a second name. If you have a Chinese name and this is for a US government form, then most likely you have to use a Latinized version of your name. The possibilities are endless.
> If it's a US government database, then you don't need to worry about stuff like that, because they're not going to care about people using some obscure language that isn't covered by Unicode.
Why, because immigration isn't tracked in US government databases? Liaisons in other countries aren't tracked in US government databases?
> Is there even such a language? I doubt it, not any that are in actual use.
A simple google search shows that yes, there are[1]. They may not be common, but you know where uncommonly used languages see use? In names, from people trying to preserve their heritage.
> Now again, as I mentioned before, a lot of this depends on exactly what your application is
Actually, you're saying that now. I've been saying it from the beginning. Actually, that was the point of my first comment, and the point of my second comment, and the point of this comment. You said "Assuming that you're going to require at least two names (no single-name names allowed), then a string for First Name and a string for Last Name should be sufficient." and I was trying to point out that even if we accept the assumption that you need to require at least two names, there are cases where depending on your purpose you should really examine what you really need an build your schema around that, not some simple rule summarized as two database fields of unlimited length Unicode text.
I said this explicitly in the first comment with "What this really means is, depending on the application and how important the name is to you, you will have different requirements."
The entire last paragraph of my second comment was about this.
> so it's pretty hard to come up with any ideas or rules without knowing this.
Which is my point, and the point of the linked post about names. You need to know your problem domain, and the data you are expected to encounter. It's also not what you stated in your original comment. I'll just assume you were being a bit overly assertive initially, and it's not really the entirety of your stance, because you're making the opposite argument now. We've wasted a lot of time when you could have just said, "Yeah, if name fidelity is important, know your expected data and make plans for special cases if it's important." Which, again, is exactly what I said in my second comment.
5. People have exactly N names, for any value of N.
11. People’s names are all mapped in Unicode code points.
18. People’s names have an order to them. Picking any ordering scheme will automatically result in consistent ordering among all systems, as long as both use the same ordering scheme for the same name.
20. People have last names, family names, or anything else which is shared by folks recognized as their relatives.
32. People’s names are assigned at birth.
33. OK, maybe not at birth, but at least pretty close to birth.
34. Alright, alright, within a year or so of birth.
35. Five years?
36. You’re kidding me, right?
40. People have names.
What this really means is, depending on the application and how important the name is to you, you will have different requirements. You need to fit your schema to your needs. For some applications, the name may be important, and extra care might need be taken to reflect the user's entry as accurately as possible. For others, it may simply be a string you expect your payment processor to accept, and that's all you care about.