You can sort in linear time!
TLDR: Yes you can sort numbers in linear time. Yes, including floats! The proof that you can sort in $O(n \log n)$ time is still true, but that assumes you're sorting on the infinite set of real numbers (or integers.) We can sort in linear time because we work with finite sets: Floats and ints. The algorithm is very simple, and is called counting sort (or bin sort or radix sort.)
See the Python notebook wrapping an unoptimized Rust countsort implementation:

Is this truly $O(n)$? Yes!
 Counting sort is not a randomized algorithm, it always outputs the correct solution.
 Counting sort is not a parallelized algorithm. (And you can't use parallelism to speed past a worstcase bigO run time anyway!)
 Counting sort is not an amortized algorithm. It is truly $O(n)$, not 'expected' or 'amortized' $O(n)$.
 Counting sort is technically $Θ(n)$, that is, the worst and best case running time is linear.
 This can be used in the real world.
 However, it is not an inplace sorting algorithm.
ok, how does counting sort work?
The idea of the algorithm is that you count how many times a number appears in a list.
 Get input a list
L
of numbers from some finite setK
.  Instantiate a dictionary
S
with zeros, with one element for each item inK
.  For each item in
L
, increment the corresponding element inS
by one.  Then, for each item
k in K
, outputk
a total ofL[k]
times each.
Here's a simple implementation in Python.
set_of_numbers_to_sort = list(range(0, 256))
counted_numbers = {kk : 0 for kk in set_of_numbers_to_sort}
numbers_in = input("Enter values in [0, 255] separated by space:\n")
numbers_in = numbers_in.split(' ')
for number in numbers_in:
counted_numbers[int(number)] += 1
for number in set_of_numbers_to_sort:
for _ in range(counted_numbers[number]):
print(number)
Note: This example implementation uses a dict for clarity and generality, so it's technically not guaranteed Θ(k+n)!
a realworld example of counting sort on integers
TLDR: Imagine you have a 1billion letter corpus. Here, n = 1000000000 and k=26. So, the O(n+k) countingsort algorithm will be faster than the conventional $O(n \log n)$ sorting algorithms.
So, countingsort can be used in the real world when you have to sort a very large list of values that belong to a relatively small set. This could be 8bit or 16bit floats or ints, or even smaller sets like the 26character alphabet.
For this example, let's say you wanted to see the frequency of letters in a 1billioncharacter English text. You can count each character to see how many of the 26 letters appear. This is equivalent to sorting.

Get an input
L
which is the 1billioncharacter text, whereK = {A, B, C, ..., X, Y, Z}
. 
Instantiate the dict
S = {A : 0, B : 0, C : 0, ..., Z : 0}
. 
For each character in in
L
, increase the corresponding element inS
by one.
E.g. For 'C', we get
S = {A : 0, B : 0, C : 1, ..., Z : 0}
. 
E.g. And then if we get an A for the next, we get
S = {A : 1, B : 0, C : 1, ... }
.


And then print each element from S.
This algorithm is O(L + K)
. This is also written as O(n + k)
, where k
is the size of the set we want to sort.
tell me more about sorting
So far we assume you need to perfectly sort an arbitrary list of values.
You can speed things up if:
 You know something about the distribution of values, or
 You know something about the application of the sort.
If you're sorting floats, but you know most of your values are NaN or 0.0, you can use countingsort for NaN and 0.0, and use a traditional sort on the rest.
What if your sort doesn't need to be super accurate? Say, for a very niche case, you want to sort a histogram on the logarithm of some list of values. Then, you can sort your float64s on the 11bit exponent, which means k = 2^11 = 2048
, which means you can use countingsort!
okay so what's the takeaway here
Well, I've only ever written sorting algorithms in jobinterviews and university exams. I'm happy to say some of them were linear, and I'm very happy to say I've never had to performanceengineer things like this.
There is a small rift between the model of computing used in computerscience and the actual computers we use. The space between that rift is really cool, and leads to fun facts like "you can sort in linear time".
Remember to like, comment, and subscribe* for more fun blogposts, such as:
 "Sorry Amdahl, but parallel computing DOES give you superlinear speedup"
 "P and NP don't matter if you're not in aerospace because randomized algorithms are magic"
 "Moore's Law isn't exponential, and it's not sigmoidal either. It's a distribution, I'm afraid we're nearing it's peak, and I'm frightened about the world beyond it: A Short Treatise on Why 640KB ought to be enough for anyone because you won't have a choice in 200 years"
*This is a joke, I'm not actually writing these blogposts, and you can't like, comment, or subscribe.