# Benford’s Law – How mathematics can detect fraud!

Articles, Blog 100 Comments

hello everyone I’ve got a really surprising fact about numbers to tell you about today it’s so surprising that you have mathematicians are surprised by this in fact they can use this then to detect fraud and it’s called Benford’s law now you can discover this for yourself for example pick a newspaper and circle or the numbers now I’m using the Financial Times here because it has a lot of numbers in it and I’m only interested in numbers that can grow naturally so I’m thinking of things like prices and percentages and stock shares I’m not interested in page numbers and telephone numbers because that’s a little bit too constrained for our purposes but if I do that there were 127 numbers on the front page of The Financial Times today and my question is this how many of them do you think start with the number one now you might think well that should be about one night so numbers can start with one two three four five six seven eight nine we’re not including zero here so if there’s 127 numbers about 14 of them to start with the number one when in fact it was a huge 42 numbers started with a 1 and if I keep going 21 started with a 2 12 numbers started with a 3 7 started with a 4 6 numbers started with a five eight numbers started with a 6 9 started with a seven eleven started with an 8 and another 11 started with a 9 if I keep going I’ll find something like this is a surprising 30% of numbers start with in number 1 this law was first discovered in 1881 by the astronomer Simon Newcomb but then was later we discovered over 50 years later by the physicist Frank Benford while he was working for General Electric but they both noticed it in the same way they both noticed that pages near the beginning of their log tables were getting more worn and dirty than pages near the end we suggested that they were looking up statistics I started with a 1 morphed and then with higher numbers and then found Benford did what we just did it will take newspapers and he look at the numbers inside and he noticed the same thing that number started with a 1 turn up about 30% of the time and then he started to look at other statistics he started to look at populations the length of rivers even things like mathematical and physical constants even street addresses and they showed the same thing at numbers starting with a 1 turn up about 30% of the time it doesn’t matter what units you use so if you want to measure the length of a river you can do that in kilometers miles feet or centimeters it will still be the same the number 1 will turn up more after and you can even mix up your statistics so if you’re taking numbers from a newspaper those statistics are coming from a variety of sources and yet the law still holds now this isn’t true if your choice is to random if you went to random.org and generated a bunch of numbers you all find that numbers starting with a 1 will turn up about 1/9 of the time and it doesn’t work either if your choice is too restrictive so I’m measuring people’s height using centimeters and not a lot of numbers are going to start with a 3 except for oh you got one guy in China to show you my numbers starting with a 1 turn up so often imagine I put one pound in the bank and it’s a very generous bank I own 10% interest every day so I start with 1 pound then the next day I have 1 pound 10 and then I have 1 pound 21 then I have 1 pound 33 and so on and you’ll notice I spent a long time around the load numbers but then I started to skip through the higher numbers and then I hit the teens and it happens again I spent a long time between 10 and 20 but then I start to skip through the higher numbers and even from this you can see that numbers starting with a 1 turning up about 30% of the time and that’s the general idea and it is used to detect fraud so if you’re cooking the books if you’re making up numbers people tend to maybe spread out the numbers evenly or maybe start picking the middle in numbers like four or five and sixes and if these numbers don’t follow Benford’s law they may be committing fraud now I’m going to show you why this is true but be warned this next bit is for serious magnet ish ins only like I said this was discovered when neuquén Benford noticed that pages at the beginning of their log tables were getting worn and tatty but some of you may not know what log tables are in the days before calculators this is what they use the multiply large numbers now I imagine you know how powers of 10 work so if I give you a number n you can return 10 to the power n so if I give you 2 you return 10 squared 10 times 10 100 if I give you 3 you return 10 cubed 10 times 10 times 10 1,000 now as powers go up you multiply by 10 so you get one hundred one thousand ten thousand one hundred thousand and so on as powers go down you do the opposite you divide by ten so you’ve got 1,100 10 and one so 10 to the power 0 is 1 now log or logarithm to the base 10 is the reverse of that so if you give me a number I can give you the original power so for example log of 1,000 is 3 log of 100 is 2 log of 10 is 1 and log of 1 is 0 and you can connect those points together and work out real odd off things in between so log of 50 is one point seven lot tables we use to multiply large numbers because the log of x times y is equal to the log of X plus the log of Y so multiplication now just becomes the addition of logs and you can reverse the process and get your answer now lock tables only needed to go between 1 and 9 because if you had something larger like the log of 273 then that was just the log of 2.73 plus the log of 100 and the log of 100 is 2 now new command benefit notice that the probability that a number starts with the digit n was the log of n plus 1 minus the log of n but he couldn’t explain why I’m going to show you why but the the idea essentially is that if we collect a lot of data we want the amount of things that between say 1 and 2 to be the same as the amount of data between 10 and 20 and I want that to be the same as the amount of data between 100 and 200 and the only way you can do that is if the probability that you start with a number 1 is 30% if Benford law exists and it’s universal it should be unaffected by which units we choose the measure things with so I could measure things with meters speeds miles or pounds dollars ninnies whatever and benfits law should still hold true so let’s imagine we have some data like this and what this shows is I have a lot of data between 25 and 40 but not so much data between 1250 and 2000 now they say I want to convert this data into something else so I’m going to multiply it by 50 and would provide everything by 50 it’s some sort of conversion factor and the data changes and I might get something like this and what we have now is I have a lot of data now between 1250 and 2000 if this was scale invariance that blue gap would be the same as that red gap so the blue gap represents data that is 50 times larger than the red gap now we took the log of this day 2 instead we would get this and the blue gap now just represents data that is 1.7 higher than the red gap now remember if I want this to be scale invariant I want the size of the blue gap to be the same as the size of the red gap so all I’m saying is I want this to be unaffected on the shifts in other words the log of the data should be uniform like this now all I need to do is reverse this to find the original distribution of something that is scale invariant and this is what we get and from this you can see if I doubled the distribution the gap between 1 and 2 is the same as the gap between 2 and 4 which is the same as the gap between 4 and 8 if I times byte the gap between one and two is the same as the gap between 10 and 20 which is the same as the gap between 100 and 200 now we want to know the probability that number starts with a particular digits because the pattern repeats we only need to consider the numbers between 1 and 9 so imagine I threw a dart at this was the probability I’m going to hit a number beginning with a 4 well it’s going to be the length of the section between 4 & 5 divided by the total length now the length of the section between 4 & 5 is lon of Phi minus log of 4 and the total length will ask the log of 10 which is 1 in general the probability that a number starts with the digit n is the log of n plus 1 minus the log of n and in fact this works for any string of digits imagine I wanted to know the probability that a number starts with the string 1 2 3 well that’s the log of 124 minus the log of 123 which is about 0.4% and using this you can start to work out the probabilities our digits appearing in other positions like the second position the third position although that quickly becomes 10 percent for each of the 10 digits 0 to 9 now I’m not saying that this is an easy proof but it is a truly surprising fact about numbers and if you have been thanks for watching

olechkaissocoolPost authorThank you very much, James. Very cool video.

LukeOfTroyPost authorNot sure I followed all that. I had a different answer in mind. I would have guessed that the effect comes from the fact that you always start at zero. If you pick any 2 random numbers larger than zero, the smaller one is more likely to start with a one than another number. If the largest random number is some combination of nines, like nine hundred and ninety nine, then all the numbers equal or below it start with all numbers with equal frequency. But if the highest random number is anything less than double that, then the number of one starters goes up to more than half, and if the number is anything more than nearly 0.9 of that all nine number, then it is eliminating all the alternatives. On average, the largest number will be five or less, so half the time, the chance of the smaller number starting with a six or more is zero, and the chance of a one is (25.11r)% where r repeats the number for the magnitude. So a large of five hundred would have a hundred hundreds, ten tens, and a one, or (25.11r)% of all numbers that the small number could be that start with one.

A fifth of the time the large number will start with a 2, which means a hundred and eleven 1 starters out of a maximum of 300 including 0. Every large number between the first 2 and eleven percent below it below it have better than average odds of being a one start. So no matter what the large random is, the small almost always have a better or equal shot at starting with one, and no other number will ever actually have a higher chance. The only way you can beat this is by, as you said, narrowing the data pool, by including a third random number to be the minimum instead of zero, and by ruling it so the largest number cannot exceed smallest number's magnitude. Something like human height in feet is like this, the minimum, or maybe lower quartile, is not zero, and the maximum/ upper quartile does not leave the order of magnitude, so the chances of starting with a one are incredible small.

But if you were to guess the value of any random thing when you don't know what it is, the chances are that largest it could possibly is not all nines, in which case the actual size will be more likely to be a one starter than at least a nine, and whatever the max is will never raise the probability of a starter above the one.

Put simply, if there's an upper limit, it excludes non ones last, and real world things always have an upper limit.

Neil CrabbePost authorNingis? That's fiddling small change, surely?

SpyderPost authorYou scared the crap out of me at the start of the video Prof. G

Garrett Van CleefPost authorAnyone want to take a stab at, say, the distributions of, say, the reported county by county vote counts in MI, WI and PA in the 2016 presidential election? How about the other states as well. This should be interesting 😉

Goh Tee WePost authorWe can actually determine whether there is a god by checking that nature's probability follows the bell shape curve, if everything in nature is bell shaped curved, there is no god and things just happen according to nature's frequency, however, if there is a nature thing that do not follow the bell shape curve, then god has a hand in it and there is a god

Gerson KazumiPost authorIf Bedford's law holds universally, when the example of physical constants is given, the number that turns up is ~40%, I imagine if that means that we are yet to find much more constants so that the proportion of constants starting with 1 will be closer to 30%

trejkazPost authorHoly crap, there's a guy in china who is either 30cm tall or 3m tall?

Austin LiuPost authorDoes Benford's Law remain true for different number bases? If you took data that conformed to Benford's law in Base 10, and converted it to Base 7, or Base 9, would it still conform to the law?

Abdullah MustaphaPost author"that one guy in china"

Garrett Van CleefPost authorDo this for county by county per state in the US presidential election 😉 Very revealing.

Jim BradleyPost authorUsing the FT runs a serious risk of selection bias. The price of sterling in dollars starts with 1 and has for decades. There may be whole columns or even sections containing only such numbers. Now consider the corresponding page from the FT in the period 1915 to 1975: there would be zero such numbers.

Rich 91Post authorthanks for the fraud advice :p

RedsBoneStuffPost authorStep 1: Add this to the video's link at the top of your screen: &t=1

Step 2: Keep pressing F5

Doug GalePost authorIt seems like this concept could be exploited for data compression.

DrizzPost authorI've done so many practice problems in my AP Stats class that have to do with Benford's Law..

hawkturkeyPost authorCurious. Is it possible that numbers used by people tend to begin with low numbers for psychological reasons? E.g. in setting quantities or prices or purchase amounts. So they aren't random.

mournblankPost authorPi and Euler itself… is a fraud 😀

TheElectra5000Post authorWhy all the shouting? Couldn't you be closer to the mic? And what happened to the brown paper?

Tristan JohnsonPost author3:24

TRIIIGGGGGGGGGGGGEEEERRRRRRREEEEEEEEEEEDDDD

Sam CornwellPost author24 seconds before I realised this was from 2011. Up until that point I was wondering what James was getting so damn animated about.

bdylanfan90Post authoryou forgot to circle Bob Dylan, he's number 1

endxofxeternityPost authorIsn't this obvious? Of course 1 turns up more. We start at one and count forward. The lower numbers will always appear first and we cut things short before we get to a higher number. We always use smaller numbers more and then increase from there slowly. It's why small change and small notes get passed around more than higher notes ie. $5 vs $100 note usage.

I still didn't get the log stuff though. I'm so used to linear number counting that logs confuse me 🙁

Random -Post authorHave you guys ever heard the "the law of near enough"? This is a great example for it. #Vsauce lul

TheFunnyFilesPost authorWho wrote these subtitles ?

David van BrechtPost authorWell explained. Thanks!

AlexPost authorBut what about Portugal's plea for bail-out?

Tolga şimşekPost authorholshit i feel like Ive watched magic show. Amazing because I trust the reality of this channel but I have no idea how logs actually create this thing. Feeling dumb lol

Waldo NortjéPost authorThis was explained very good. Thank you

FasteroidPost authorwtf math

Petar Todorov TodorovPost authorAwesome video ^_^ This might be a stupid question, but I wonder if part of the reason for the high percentage of 1-s in any number has anything to do with the psychological aspect of the whole? Because when we use data, a lot of times we compare something to something whole. Similarly like how the trigonometrical circle works 🙂 It has a radius of 1. Now of course you have infinite possible values for different sin cos tg ctg asin atg etc… but most of them are going to contain 1, because you use 1 circle with radius of 1 and you compare all the numbers with this whole system, if that makes sense…

Pepe JordãoPost authorI see Portugal, I upvote

75ur15Post authorThat probability of hitting anything with a dart assumes you don't miss entirely

Slim CognitoPost authorThis is amazing

-Double Negative-Post authormy paypal has exactly 99c in it. you've failed me this time, math.

SuperNovaJinckUFOPost author85% of made up statistics are multiples of 5

75% of made up statistics are multiples of 25

60% of made up statistics are multiples of 10

50% of made up statistics are "50%"

spanzikPost authorn1

Cob CanonPost authorWow the Chinese are really tall, wait..

0record0Post authorI looked at the views in recommendations by youtube on the right side of the window. There were 20 videos and 6 of them began with a number 1 ! That is exactly 30%

Now I am unhappy and unsatisfied that it actually works XD It takes some fun out of life doesn't it!

supadoxPost authorJames makes this show worth watching.

PixodePost authorImpossible! 100% of numbers start with 0!

The WallabyPost authorI just graduated college(Comp Sci) and I didnt know wth a log actually was until you explained it. The fuck…

CarBENbasedPost authorI wonder if some kind of similar appears in other bases…

kingjamie2Post authorwhich numbers follow benfords law and which are "too random" or are sampled from a distribution that is "too specified" ?

Eliseu CaldeiraPost authorDear diary, today I learned how to commit fraud.

Ratz AlterPost author"… you can measure it in km or cm and it will start with 30% 1s"

I can do better: I et you, if I measure something in km and in cm (and i measure correctly) both measurements will always start with the same number: How surprising!

Jess StuartPost authorYou could develop a rounding method based on Benford's law.

Because log(0.5) is approximately -3, If the least significant digit is 3 or less, round down. 4 or higher, round up. 4 is "logrithmically closer" to 10 than to 1. People used to multiply with slide rulers. It is really easy to see the logrithmic distribution of digits by comparing the log and regular scales.

43labontepettyPost authorThat's almost exactly a perfect logarithmic scale. lol holy crap.

Vicente OPost authorI know it's too late, but what formula did you use to calculate the interest? I used

Y=1(1+(.1/365))^(365X)

and got these numbers

1, 1.10, 1.22, 1.34, 1.49, 1.64, etc…

from 0 to 5

AnvilshockPost author2:34 – "It doesn't matter what unit you use. You want to measure […] in kilometers […] centimeters […] it will still be the same." Anyone should deeply expect that the most significant digit for anything given in different METRIC units be the same for the same things …

charles ranalliPost author+Marcus Anderson

i am reproducing here – a comment (slightly edited) made earlier by one Marcus Anderson

"+Wowmaxy yes computers expose the slight fallacy in all this. More correctly, in computers, numbers starting with 0 occur 50% of the time, whereas in publishing – no numbers >= 1 start with zero – ever ! That is the missing part of this story. When you do digit analysis of a page of numbers – you must pad all the small numbers with zeros – to make them all the same length. Now you will see the true distribution, and find that only 10% of all n-digit numbers (for a fixed value of n) actually start with 1, as expected. The party trick here is that, in publishing, leading zeros are omitted for brevity, albeit perhaps incorrectly. Engineers recognise this issue and standardise engineering notation with a 3 digit mantissa and an exponent in powers of 3. (powers of 3 ???) Thus numbers should be published in a standardised engineering notation style (eg $0.03K = $30 and 0.16Mm = 160Km (etc))."

i think Marcus Anderson may have a valid point – as to why the first digit – and only the first digit – of CERTAIN (could-be-better-defined) data sets – show a probability distribution skewed toward the logarithmic – rather than evenly distributed – among the digits 1 thru 9.

perhaps Mr Grimes might consider getting back to us on the merits of Mr Anderson"s pleadings.

i think it might shed some more light on this still-rather-murky Benford Business.

what say -James ?

PINEAPPLES !!!

Wolfy the wolfPost authorI FOUND YOUR CHANNEL!!

MrGoatflakesPost authorActually it's not that surprising if you consider that most

~~numbers~~measurements are floating point numbers essentially. So we are talking about their leading digits we are not really talking about their magnitude but their significance. So we shift the numbers up and down and in binary the first digit will be always 1, so much so it can be ignored, and a 1 assumed to be there, except in the case of 0 and other special forms.Mark SmithPost authorNot really surprising give these number go up to a value therefore are very biased in their production. You have to show a big lot of something on shop display (the abundance theory of shop display- you show a lot and people buy a lot) but having a big number is costly for supply reasons- which is more likely they choose 12 or 30- you will choose 12 less than 30 but you can't less than 10 because then it doesn't look like abundance- only hard for mathematicians to understand.

Jordan MunroePost authorNature uses multiplication more often than it uses addition.

Luke LambournePost authoryour bad

XylokPost authoryou just gained a VERY long overdue and VERY well-deserved subscription dude! Been watchin' your vids for years and plan to keep on doing so. It isn't every one who can explain such lofty concepts via such simple means. You're a rare kind of pimp…overall…unique…and i do mean pimp in the stud sense, and i do mean stud in the cool guy as opposed to the horse sense, as opposed to pimp in the sense that would equate to my recently having accused you of possessing and controlling and renting out and likely occasionally slapping a slew of hookers who like all people deserve happiness and basic human respect and dignity and sky and love and friendship and the chance to dance and so on should they desire to.

DemonLordChaosPost authoroh, so it's because every base has 0(all numbers have a leading 0, and base 0 only has one value, being 0), fewer have base 1, fewer base 2, so considering all numbers in all bases, larger numbers get rarer than small numbers.

guloguloguyPost author…I suddenly realized that I NEED to go out to buy some Lottery tickets!… LOL!!!….

kar0xPost authorohh snap, mind blown!

Recursive TriforcePost author3 is everywhere but most times not in the beginning.

froopPost authorIts raining outside…

Per SchrijverPost authorOf the first one hundred numbers in the fibonacci sequence exactly 30 start with a one.

john wingatePost authorWhat are the odds that SRV would be on the cover od the financial times?

David WilkiePost authorExcellent lecture and demo.

BL seems to be in the same category as the inverse of scalar fields QM-TIME style, and results in a conglomeration of primes over-lapping by psudo-random quantities in a single connected quality of point(s)?

I'm not a Mathematician, just seriously interested in the context that includes Benfords Law, because it looks like the mathematical/temporal origin of chemical bonding characteristics?

Biological complexity, under these conditions, has no final limitation, because it's all about the universal Phi;- wave-package integration of time rates as relative logarithmic scales of the same nature as in the presentation.

___The cause-effect of QM-Time modulation Superspin-Superposition-singularity is the continuous, metastable consistency of here-now temporally in the spectrum of Eternity-now. BL is an identification of some Actuality structural relationships.

δτPost authorThat is a true astronomer: 41.3% is roughly equal to 30.0%.

Tomás Ezequiel Morcos PorrasPost author8:30 I think he forgot pink Gap 😛

HaleyHalcyon's shitposts and non-gamesPost authorI knew this instinctively, since a lot of numbers have an inverse log prevalence, and in a log scale graph, the area between 1.0 and 2.0 are bigger than the area between 2.0 end 3.0 and so on.

Rob MacKenziePost authorGod I love your videos and enthusiasm James. Thanks for all you put into it.

Charlie StaichPost authorYou can hang some blankets off screen to soak up the echo/reverb in the audio 🙂

Vivek KumarPost authorwhile watching this , i saw 19 upcoming videos out of which 6 videos were starting with 1 in terms of their duration that equals to about 31 % ,, it's quite amazing

KaczankukuPost authorSo explain why keys of computer keyboard don't break down or vanish as the Benford's law stays?

Skeleton RowdiePost authorlooooool i was like why are you talking so fast? seems like i've been using youtube at 1.25 speed for the last day at least XD you are the first one to exceed realistic proportions at that speed haha

Joseph AllenPost authorCould you please get MORE excited about math.

TheBWoods15Post authorI've been studying lottery numbers to see if certain digits appear more often than others. One thing I've noticed is that the digit 4 is the least occurring digit in winning lottery numbers. I've kept track of the numbers for only three months, but my theory is still proving to be very strong.

Pale BearsPost authorI stopped watching once I learned the following section is for serious mathematicians only.

Gotem.

theboombodyPost authorI remember learning about this in my accounting classes, but not my math classes. It's kind of interesting where you come across certain things.

Ant ManPost authorDidn't realize I was going to learn how to get away with money laundering when I started clicking on math videos today.

Petra LenthePost authorWhy does at 9:04 the gap between 1 and 2 (3 Blocks) has the same length as the gap between 2 and 4? Shouldn´t it be half the length (1,5 Blocks from 1 to 2)? And the gap between 4 and 10 is 4 blocks long but shouldn´t it be 6 blocks long instead? I really don´t get it. I would be very glad if someone could explain it to me.

Music Subic and CebuPost authorIS there an easy proof?

Music Subic and CebuPost authorFrom Wikipedia . . . "There are illustrative examples and explanations that cover many of the cases where Benford's law applies, though there are many other cases where Benford's law applies that resist a simple explanation."

Does "resist a simple explanation" mean "don't have an explanation?

Tony TonePost authorYou better stop abusing that paper pal.

eXodiquasPost authorIn binary it's every number except 0. :O

Coincidence?

polyhistorphilomathPost authorRatios near unity are going to start with ‘1’ . Particularly if you filter out any numbers that start with ‘0’!

Logarithms are similar… if you put everything into floating point format.

Also left-padded numbers

00001. Throw it out

00002. Throw it out

… …

09998. Throw it out

10000. The only real number!

xlPost authorThis didn't age well

Stefan TravisPost authorSounds like a cousin of Zipf's Law. Both aspects of the same underlying law?

UlkomaalainenPost authorI would venture the guess that (2:45) indeed the results for "meters", "kilometers", and "centimeters" would be very similar when counting starting numbers.

Andrew BlechingerPost authorFinally, a use for the common logarithm.

John EPost authorI built this into production control programs 20 years ago to catch people who were under-reporting gold wastage and pocketing the shortfall.

AscdrenPost authorSo what you're saying is that if you're going to be committing fraud you should remember benfords law

MrImarcusPost authorMy eyes were hurting watching you NOT blink… How much coffee do you drink???

SadNTasteless VeganPost author3:20 "that one guy in China" those dam Chinese and their dammed 3 inches.

Jacquelyn ScottPost authorYou look so happy and excited at the fade-in at 3:26! 😀

Sport und AnderesPost authorBut when I know that forensics looks out for that distribution then I can counteract it easily.

modi XPost authorWow, school failed to teach me how log really works, and you explain me it in 10 minutes and I understand o.o

Leo179Post authorviews: 411k

dislikes: 98

likes: 7.5k

comments: 803

not quite (although, to be fair)

subs:

1_99kNikola NegovanovicPost authorIt is because natural quantities are distributed geometrically, meaning that the ratios instead of the differences

of the numbers are what matter.

Steve ZelaznikPost authorI looked at the number of views for all the recommended videos to the right of my screen. (Your results may vary). 12 out of the 39 videos had a number of views that started with the number "1".

VENOM GAM3RPost authorNice

steve mcdonaldPost authorgreetings Singingbannana..i have a question for you..in speaking of the Triple Alpha Process, i think that astrophysicists have been fudging their numbers in order to make the process appear to account for the carbon in the universe, in order to prop up their naturalist theory..what are your thoughts on this?..thank you in advance!…a big fan!