Benford’s Law – How mathematics can detect fraud!

Benford’s Law – How mathematics can detect fraud!

Articles, Blog , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , 100 Comments

hello everyone I’ve got a really surprising fact about numbers to tell you about today it’s so surprising that you have mathematicians are surprised by this in fact they can use this then to detect fraud and it’s called Benford’s law now you can discover this for yourself for example pick a newspaper and circle or the numbers now I’m using the Financial Times here because it has a lot of numbers in it and I’m only interested in numbers that can grow naturally so I’m thinking of things like prices and percentages and stock shares I’m not interested in page numbers and telephone numbers because that’s a little bit too constrained for our purposes but if I do that there were 127 numbers on the front page of The Financial Times today and my question is this how many of them do you think start with the number one now you might think well that should be about one night so numbers can start with one two three four five six seven eight nine we’re not including zero here so if there’s 127 numbers about 14 of them to start with the number one when in fact it was a huge 42 numbers started with a 1 and if I keep going 21 started with a 2 12 numbers started with a 3 7 started with a 4 6 numbers started with a five eight numbers started with a 6 9 started with a seven eleven started with an 8 and another 11 started with a 9 if I keep going I’ll find something like this is a surprising 30% of numbers start with in number 1 this law was first discovered in 1881 by the astronomer Simon Newcomb but then was later we discovered over 50 years later by the physicist Frank Benford while he was working for General Electric but they both noticed it in the same way they both noticed that pages near the beginning of their log tables were getting more worn and dirty than pages near the end we suggested that they were looking up statistics I started with a 1 morphed and then with higher numbers and then found Benford did what we just did it will take newspapers and he look at the numbers inside and he noticed the same thing that number started with a 1 turn up about 30% of the time and then he started to look at other statistics he started to look at populations the length of rivers even things like mathematical and physical constants even street addresses and they showed the same thing at numbers starting with a 1 turn up about 30% of the time it doesn’t matter what units you use so if you want to measure the length of a river you can do that in kilometers miles feet or centimeters it will still be the same the number 1 will turn up more after and you can even mix up your statistics so if you’re taking numbers from a newspaper those statistics are coming from a variety of sources and yet the law still holds now this isn’t true if your choice is to random if you went to and generated a bunch of numbers you all find that numbers starting with a 1 will turn up about 1/9 of the time and it doesn’t work either if your choice is too restrictive so I’m measuring people’s height using centimeters and not a lot of numbers are going to start with a 3 except for oh you got one guy in China to show you my numbers starting with a 1 turn up so often imagine I put one pound in the bank and it’s a very generous bank I own 10% interest every day so I start with 1 pound then the next day I have 1 pound 10 and then I have 1 pound 21 then I have 1 pound 33 and so on and you’ll notice I spent a long time around the load numbers but then I started to skip through the higher numbers and then I hit the teens and it happens again I spent a long time between 10 and 20 but then I start to skip through the higher numbers and even from this you can see that numbers starting with a 1 turning up about 30% of the time and that’s the general idea and it is used to detect fraud so if you’re cooking the books if you’re making up numbers people tend to maybe spread out the numbers evenly or maybe start picking the middle in numbers like four or five and sixes and if these numbers don’t follow Benford’s law they may be committing fraud now I’m going to show you why this is true but be warned this next bit is for serious magnet ish ins only like I said this was discovered when neuquén Benford noticed that pages at the beginning of their log tables were getting worn and tatty but some of you may not know what log tables are in the days before calculators this is what they use the multiply large numbers now I imagine you know how powers of 10 work so if I give you a number n you can return 10 to the power n so if I give you 2 you return 10 squared 10 times 10 100 if I give you 3 you return 10 cubed 10 times 10 times 10 1,000 now as powers go up you multiply by 10 so you get one hundred one thousand ten thousand one hundred thousand and so on as powers go down you do the opposite you divide by ten so you’ve got 1,100 10 and one so 10 to the power 0 is 1 now log or logarithm to the base 10 is the reverse of that so if you give me a number I can give you the original power so for example log of 1,000 is 3 log of 100 is 2 log of 10 is 1 and log of 1 is 0 and you can connect those points together and work out real odd off things in between so log of 50 is one point seven lot tables we use to multiply large numbers because the log of x times y is equal to the log of X plus the log of Y so multiplication now just becomes the addition of logs and you can reverse the process and get your answer now lock tables only needed to go between 1 and 9 because if you had something larger like the log of 273 then that was just the log of 2.73 plus the log of 100 and the log of 100 is 2 now new command benefit notice that the probability that a number starts with the digit n was the log of n plus 1 minus the log of n but he couldn’t explain why I’m going to show you why but the the idea essentially is that if we collect a lot of data we want the amount of things that between say 1 and 2 to be the same as the amount of data between 10 and 20 and I want that to be the same as the amount of data between 100 and 200 and the only way you can do that is if the probability that you start with a number 1 is 30% if Benford law exists and it’s universal it should be unaffected by which units we choose the measure things with so I could measure things with meters speeds miles or pounds dollars ninnies whatever and benfits law should still hold true so let’s imagine we have some data like this and what this shows is I have a lot of data between 25 and 40 but not so much data between 1250 and 2000 now they say I want to convert this data into something else so I’m going to multiply it by 50 and would provide everything by 50 it’s some sort of conversion factor and the data changes and I might get something like this and what we have now is I have a lot of data now between 1250 and 2000 if this was scale invariance that blue gap would be the same as that red gap so the blue gap represents data that is 50 times larger than the red gap now we took the log of this day 2 instead we would get this and the blue gap now just represents data that is 1.7 higher than the red gap now remember if I want this to be scale invariant I want the size of the blue gap to be the same as the size of the red gap so all I’m saying is I want this to be unaffected on the shifts in other words the log of the data should be uniform like this now all I need to do is reverse this to find the original distribution of something that is scale invariant and this is what we get and from this you can see if I doubled the distribution the gap between 1 and 2 is the same as the gap between 2 and 4 which is the same as the gap between 4 and 8 if I times byte the gap between one and two is the same as the gap between 10 and 20 which is the same as the gap between 100 and 200 now we want to know the probability that number starts with a particular digits because the pattern repeats we only need to consider the numbers between 1 and 9 so imagine I threw a dart at this was the probability I’m going to hit a number beginning with a 4 well it’s going to be the length of the section between 4 & 5 divided by the total length now the length of the section between 4 & 5 is lon of Phi minus log of 4 and the total length will ask the log of 10 which is 1 in general the probability that a number starts with the digit n is the log of n plus 1 minus the log of n and in fact this works for any string of digits imagine I wanted to know the probability that a number starts with the string 1 2 3 well that’s the log of 124 minus the log of 123 which is about 0.4% and using this you can start to work out the probabilities our digits appearing in other positions like the second position the third position although that quickly becomes 10 percent for each of the 10 digits 0 to 9 now I’m not saying that this is an easy proof but it is a truly surprising fact about numbers and if you have been thanks for watching

100 thoughts on “Benford’s Law – How mathematics can detect fraud!

  • olechkaissocool Post author

    Thank you very much, James. Very cool video.

  • LukeOfTroy Post author

    Not sure I followed all that. I had a different answer in mind. I would have guessed that the effect comes from the fact that you always start at zero. If you pick any 2 random numbers larger than zero, the smaller one is more likely to start with a one than another number. If the largest random number is some combination of nines, like nine hundred and ninety nine, then all the numbers equal or below it start with all numbers with equal frequency. But if the highest random number is anything less than double that, then the number of one starters goes up to more than half, and if the number is anything more than nearly 0.9 of that all nine number, then it is eliminating all the alternatives. On average, the largest number will be five or less, so half the time, the chance of the smaller number starting with a six or more is zero, and the chance of a one is (25.11r)% where r repeats the number for the magnitude. So a large of five hundred would have a hundred hundreds, ten tens, and a one, or (25.11r)% of all numbers that the small number could be that start with one.

    A fifth of the time the large number will start with a 2, which means a hundred and eleven 1 starters out of a maximum of 300 including 0. Every large number between the first 2 and eleven percent below it below it have better than average odds of being a one start. So no matter what the large random is, the small almost always have a better or equal shot at starting with one, and no other number will ever actually have a higher chance. The only way you can beat this is by, as you said, narrowing the data pool, by including a third random number to be the minimum instead of zero, and by ruling it so the largest number cannot exceed smallest number's magnitude. Something like human height in feet is like this, the minimum, or maybe lower quartile, is not zero, and the maximum/ upper quartile does not leave the order of magnitude, so the chances of starting with a one are incredible small.

    But if you were to guess the value of any random thing when you don't know what it is, the chances are that largest it could possibly is not all nines, in which case the actual size will be more likely to be a one starter than at least a nine, and whatever the max is will never raise the probability of a starter above the one.

    Put simply, if there's an upper limit, it excludes non ones last, and real world things always have an upper limit.

  • Neil Crabbe Post author

    Ningis? That's fiddling small change, surely?

  • Spyder Post author

    You scared the crap out of me at the start of the video Prof. G

  • Garrett Van Cleef Post author

    Anyone want to take a stab at, say, the distributions of, say, the reported county by county vote counts in MI, WI and PA in the 2016 presidential election? How about the other states as well. This should be interesting 😉

  • Goh Tee We Post author

    We can actually determine whether there is a god by checking that nature's probability follows the bell shape curve, if everything in nature is bell shaped curved, there is no god and things just happen according to nature's frequency, however, if there is a nature thing that do not follow the bell shape curve, then god has a hand  in it and there is a god

  • Gerson Kazumi Post author

    If Bedford's law holds universally, when the example of physical constants is given, the number that turns up is ~40%, I imagine if that means that we are yet to find much more constants so that the proportion of constants starting with 1 will be closer to 30%

  • trejkaz Post author

    Holy crap, there's a guy in china who is either 30cm tall or 3m tall?

  • Austin Liu Post author

    Does Benford's Law remain true for different number bases? If you took data that conformed to Benford's law in Base 10, and converted it to Base 7, or Base 9, would it still conform to the law?

  • Abdullah Mustapha Post author

    "that one guy in china"

  • Garrett Van Cleef Post author

    Do this for county by county per state in the US presidential election 😉 Very revealing.

  • Jim Bradley Post author

    Using the FT runs a serious risk of selection bias. The price of sterling in dollars starts with 1 and has for decades. There may be whole columns or even sections containing only such numbers. Now consider the corresponding page from the FT in the period 1915 to 1975: there would be zero such numbers.

  • Rich 91 Post author

    thanks for the fraud advice :p

  • RedsBoneStuff Post author

    Step 1: Add this to the video's link at the top of your screen: &t=1
    Step 2: Keep pressing F5

  • Doug Gale Post author

    It seems like this concept could be exploited for data compression.

  • Drizz Post author

    I've done so many practice problems in my AP Stats class that have to do with Benford's Law..

  • hawkturkey Post author

    Curious. Is it possible that numbers used by people tend to begin with low numbers for psychological reasons? E.g. in setting quantities or prices or purchase amounts. So they aren't random.

  • mournblank Post author

    Pi and Euler itself… is a fraud 😀

  • TheElectra5000 Post author

    Why all the shouting? Couldn't you be closer to the mic? And what happened to the brown paper?

  • Tristan Johnson Post author



  • Sam Cornwell Post author

    24 seconds before I realised this was from 2011. Up until that point I was wondering what James was getting so damn animated about.

  • bdylanfan90 Post author

    you forgot to circle Bob Dylan, he's number 1

  • endxofxeternity Post author

    Isn't this obvious? Of course 1 turns up more. We start at one and count forward. The lower numbers will always appear first and we cut things short before we get to a higher number. We always use smaller numbers more and then increase from there slowly. It's why small change and small notes get passed around more than higher notes ie. $5 vs $100 note usage.

    I still didn't get the log stuff though. I'm so used to linear number counting that logs confuse me 🙁

  • Random - Post author

    Have you guys ever heard the "the law of near enough"? This is a great example for it. #Vsauce lul

  • TheFunnyFiles Post author

    Who wrote these subtitles ?

  • David van Brecht Post author

    Well explained. Thanks!

  • Alex Post author

    But what about Portugal's plea for bail-out?

  • Tolga şimşek Post author

    holshit i feel like Ive watched magic show. Amazing because I trust the reality of this channel but I have no idea how logs actually create this thing. Feeling dumb lol

  • Waldo Nortjé Post author

    This was explained very good. Thank you

  • Fasteroid Post author

    wtf math

  • Petar Todorov Todorov Post author

    Awesome video ^_^ This might be a stupid question, but I wonder if part of the reason for the high percentage of 1-s in any number has anything to do with the psychological aspect of the whole? Because when we use data, a lot of times we compare something to something whole. Similarly like how the trigonometrical circle works 🙂 It has a radius of 1. Now of course you have infinite possible values for different sin cos tg ctg asin atg etc… but most of them are going to contain 1, because you use 1 circle with radius of 1 and you compare all the numbers with this whole system, if that makes sense…

  • Pepe Jordão Post author

    I see Portugal, I upvote

  • 75ur15 Post author

    That probability of hitting anything with a dart assumes you don't miss entirely

  • Slim Cognito Post author

    This is amazing

  • -Double Negative- Post author

    my paypal has exactly 99c in it. you've failed me this time, math.

  • SuperNovaJinckUFO Post author

    85% of made up statistics are multiples of 5
    75% of made up statistics are multiples of 25
    60% of made up statistics are multiples of 10
    50% of made up statistics are "50%"

  • spanzik Post author


  • Cob Canon Post author

    Wow the Chinese are really tall, wait..

  • 0record0 Post author

    I looked at the views in recommendations by youtube on the right side of the window. There were 20 videos and 6 of them began with a number 1 ! That is exactly 30%
    Now I am unhappy and unsatisfied that it actually works XD It takes some fun out of life doesn't it!

  • supadox Post author

    James makes this show worth watching.

  • Pixode Post author

    Impossible! 100% of numbers start with 0!

  • The Wallaby Post author

    I just graduated college(Comp Sci) and I didnt know wth a log actually was until you explained it. The fuck…

  • CarBENbased Post author

    I wonder if some kind of similar appears in other bases…

  • kingjamie2 Post author

    which numbers follow benfords law and which are "too random" or are sampled from a distribution that is "too specified" ?

  • Eliseu Caldeira Post author

    Dear diary, today I learned how to commit fraud.

  • Ratz Alter Post author

    "… you can measure it in km or cm and it will start with 30% 1s"
    I can do better: I et you, if I measure something in km and in cm (and i measure correctly) both measurements will always start with the same number: How surprising!

  • Jess Stuart Post author

    You could develop a rounding method based on Benford's law.

    Because log(0.5) is approximately -3, If the least significant digit is 3 or less, round down. 4 or higher, round up. 4 is "logrithmically closer" to 10 than to 1. People used to multiply with slide rulers. It is really easy to see the logrithmic distribution of digits by comparing the log and regular scales.

  • 43labontepetty Post author

    That's almost exactly a perfect logarithmic scale. lol holy crap.

  • Vicente O Post author

    I know it's too late, but what formula did you use to calculate the interest? I used
    and got these numbers
    1, 1.10, 1.22, 1.34, 1.49, 1.64, etc…
    from 0 to 5

  • Anvilshock Post author

    2:34 – "It doesn't matter what unit you use. You want to measure […] in kilometers […] centimeters […] it will still be the same." Anyone should deeply expect that the most significant digit for anything given in different METRIC units be the same for the same things …

  • charles ranalli Post author

    +Marcus Anderson
    i am reproducing here – a comment (slightly edited) made earlier by one Marcus Anderson

    "+Wowmaxy yes computers expose the slight fallacy in all this. More correctly, in computers, numbers starting with 0 occur 50% of the time, whereas in publishing – no numbers >= 1 start with zero – ever ! That is the missing part of this story. When you do digit analysis of a page of numbers – you must pad all the small numbers with zeros – to make them all the same length. Now you will see the true distribution, and find that only 10% of all n-digit numbers (for a fixed value of n) actually start with 1, as expected. The party trick here is that, in publishing, leading zeros are omitted for brevity, albeit perhaps incorrectly. Engineers recognise this issue and standardise engineering notation with a 3 digit mantissa and an exponent in powers of 3. (powers of 3 ???) Thus numbers should be published in a standardised engineering notation style (eg $0.03K = $30 and 0.16Mm = 160Km (etc))."

    i think Marcus Anderson may have a valid point – as to why the first digit – and only the first digit – of CERTAIN (could-be-better-defined) data sets – show a probability distribution skewed toward the logarithmic – rather than evenly distributed – among the digits 1 thru 9.

    perhaps Mr Grimes might consider getting back to us on the merits of Mr Anderson"s pleadings.
    i think it might shed some more light on this still-rather-murky Benford Business.

    what say -James ?

  • Wolfy the wolf Post author


  • MrGoatflakes Post author

    Actually it's not that surprising if you consider that most numbers measurements are floating point numbers essentially. So we are talking about their leading digits we are not really talking about their magnitude but their significance. So we shift the numbers up and down and in binary the first digit will be always 1, so much so it can be ignored, and a 1 assumed to be there, except in the case of 0 and other special forms.

  • Mark Smith Post author

    Not really surprising give these number go up to a value therefore are very biased in their production. You have to show a big lot of something on shop display (the abundance theory of shop display- you show a lot and people buy a lot) but having a big number is costly for supply reasons- which is more likely they choose 12 or 30- you will choose 12 less than 30 but you can't less than 10 because then it doesn't look like abundance- only hard for mathematicians to understand.

  • Jordan Munroe Post author

    Nature uses multiplication more often than it uses addition.

  • Luke Lambourne Post author

    your bad

  • Xylok Post author

    you just gained a VERY long overdue and VERY well-deserved subscription dude! Been watchin' your vids for years and plan to keep on doing so. It isn't every one who can explain such lofty concepts via such simple means. You're a rare kind of pimp…overall…unique…and i do mean pimp in the stud sense, and i do mean stud in the cool guy as opposed to the horse sense, as opposed to pimp in the sense that would equate to my recently having accused you of possessing and controlling and renting out and likely occasionally slapping a slew of hookers who like all people deserve happiness and basic human respect and dignity and sky and love and friendship and the chance to dance and so on should they desire to.

  • DemonLordChaos Post author

    oh, so it's because every base has 0(all numbers have a leading 0, and base 0 only has one value, being 0), fewer have base 1, fewer base 2, so considering all numbers in all bases, larger numbers get rarer than small numbers.

  • guloguloguy Post author

    …I suddenly realized that I NEED to go out to buy some Lottery tickets!… LOL!!!….

  • kar0x Post author

    ohh snap, mind blown!

  • Recursive Triforce Post author

    3 is everywhere but most times not in the beginning.

  • froop Post author

    Its raining outside…

  • Per Schrijver Post author

    Of the first one hundred numbers in the fibonacci sequence exactly 30 start with a one.

  • john wingate Post author

    What are the odds that SRV would be on the cover od the financial times?

  • David Wilkie Post author

    Excellent lecture and demo.
    BL seems to be in the same category as the inverse of scalar fields QM-TIME style, and results in a conglomeration of primes over-lapping by psudo-random quantities in a single connected quality of point(s)?

    I'm not a Mathematician, just seriously interested in the context that includes Benfords Law, because it looks like the mathematical/temporal origin of chemical bonding characteristics?

    Biological complexity, under these conditions, has no final limitation, because it's all about the universal Phi;- wave-package integration of time rates as relative logarithmic scales of the same nature as in the presentation.

    The cause-effect of QM-Time modulation Superspin-Superposition-singularity is the continuous, metastable consistency of here-now temporally in the spectrum of Eternity-now. BL is an identification of some Actuality structural relationships.

  • δτ Post author

    That is a true astronomer: 41.3% is roughly equal to 30.0%.

  • Tomás Ezequiel Morcos Porras Post author

    8:30 I think he forgot pink Gap 😛

  • HaleyHalcyon's shitposts and non-games Post author

    I knew this instinctively, since a lot of numbers have an inverse log prevalence, and in a log scale graph, the area between 1.0 and 2.0 are bigger than the area between 2.0 end 3.0 and so on.

  • Rob MacKenzie Post author

    God I love your videos and enthusiasm James. Thanks for all you put into it.

  • Charlie Staich Post author

    You can hang some blankets off screen to soak up the echo/reverb in the audio 🙂

  • Vivek Kumar Post author

    while watching this , i saw 19 upcoming videos out of which 6 videos were starting with 1 in terms of their duration that equals to about 31 % ,, it's quite amazing

  • Kaczankuku Post author

    So explain why keys of computer keyboard don't break down or vanish as the Benford's law stays?

  • Skeleton Rowdie Post author

    looooool i was like why are you talking so fast? seems like i've been using youtube at 1.25 speed for the last day at least XD you are the first one to exceed realistic proportions at that speed haha

  • Joseph Allen Post author

    Could you please get MORE excited about math.

  • TheBWoods15 Post author

    I've been studying lottery numbers to see if certain digits appear more often than others. One thing I've noticed is that the digit 4 is the least occurring digit in winning lottery numbers. I've kept track of the numbers for only three months, but my theory is still proving to be very strong.

  • Pale Bears Post author

    I stopped watching once I learned the following section is for serious mathematicians only.

  • theboombody Post author

    I remember learning about this in my accounting classes, but not my math classes. It's kind of interesting where you come across certain things.

  • Ant Man Post author

    Didn't realize I was going to learn how to get away with money laundering when I started clicking on math videos today.

  • Petra Lenthe Post author

    Why does at 9:04 the gap between 1 and 2 (3 Blocks) has the same length as the gap between 2 and 4? Shouldn´t it be half the length (1,5 Blocks from 1 to 2)? And the gap between 4 and 10 is 4 blocks long but shouldn´t it be 6 blocks long instead? I really don´t get it. I would be very glad if someone could explain it to me.

  • Music Subic and Cebu Post author

    IS there an easy proof?

  • Music Subic and Cebu Post author

    From Wikipedia . . . "There are illustrative examples and explanations that cover many of the cases where Benford's law applies, though there are many other cases where Benford's law applies that resist a simple explanation."
    Does "resist a simple explanation" mean "don't have an explanation?

  • Tony Tone Post author

    You better stop abusing that paper pal.

  • eXodiquas Post author

    In binary it's every number except 0. :O

  • polyhistorphilomath Post author

    Ratios near unity are going to start with ‘1’ . Particularly if you filter out any numbers that start with ‘0’!

    Logarithms are similar… if you put everything into floating point format.

    Also left-padded numbers

    00001. Throw it out
    00002. Throw it out
    … …
    09998. Throw it out
    10000. The only real number!

  • xl Post author

    This didn't age well

  • Stefan Travis Post author

    Sounds like a cousin of Zipf's Law. Both aspects of the same underlying law?

  • Ulkomaalainen Post author

    I would venture the guess that (2:45) indeed the results for "meters", "kilometers", and "centimeters" would be very similar when counting starting numbers.

  • Andrew Blechinger Post author

    Finally, a use for the common logarithm.

  • John E Post author

    I built this into production control programs 20 years ago to catch people who were under-reporting gold wastage and pocketing the shortfall.

  • Ascdren Post author

    So what you're saying is that if you're going to be committing fraud you should remember benfords law

  • MrImarcus Post author

    My eyes were hurting watching you NOT blink… How much coffee do you drink???

  • SadNTasteless Vegan Post author

    3:20 "that one guy in China" those dam Chinese and their dammed 3 inches.

  • Jacquelyn Scott Post author

    You look so happy and excited at the fade-in at 3:26! 😀

  • Sport und Anderes Post author

    But when I know that forensics looks out for that distribution then I can counteract it easily.

  • modi X Post author

    Wow, school failed to teach me how log really works, and you explain me it in 10 minutes and I understand o.o

  • Leo179 Post author

    views: 411k
    dislikes: 98
    likes: 7.5k
    comments: 803
    not quite (although, to be fair)
    subs: 1_99k

  • Nikola Negovanovic Post author

    It is because natural quantities are distributed geometrically, meaning that the ratios instead of the differences
    of the numbers are what matter.

  • Steve Zelaznik Post author

    I looked at the number of views for all the recommended videos to the right of my screen. (Your results may vary). 12 out of the 39 videos had a number of views that started with the number "1".

  • VENOM GAM3R Post author


  • steve mcdonald Post author

    greetings Singingbannana..i have a question for speaking of the Triple Alpha Process, i think that astrophysicists have been fudging their numbers in order to make the process appear to account for the carbon in the universe, in order to prop up their naturalist theory..what are your thoughts on this?..thank you in advance!…a big fan!

Leave a Reply

Your email address will not be published. Required fields are marked *