Is Big Brother Really Recording all Phone Calls?

From Quora https://www.quora.com/profile/Tony-Ratagick-1

I’ve read through this entire thread, and the many responses, and am going to choose this one to chime in with my two cents and side with Paul Losleben and Harry Horse . Before I begin, let me say that I’m not responding to any of the political sides of this debate, only the technical.

Paul Sutton , in several of your responses you bring up your background in the industry, so I’ll briefly summarize my experience as well.
I have over 40 years in the telecommunications industry. I was initially trained by the US Army in long line and satellite communications, have done the same for offshore oil platforms, and been critically involved as a contractor for the Department of Defense and a few of the “alphabet agencies” of the federal government.
I am currently employed in the commercial sector as a pre-sales solution engineer for a company that provides call recording, speech recognition, transcription and voice analytics. My job is to design solutions for organizations that wish to record and analyze telephone conversations, for legal or quality purposes. My customers are mostly the Fortune (insert relevant number here).
Also, for further background, there are three primary vendors in this industry, and a host of smaller ones. I work for one of the top three, and I can attest that we all have the same basic math challenges in recording calls. In other words my competitors are no different than I am.

In summary, the subject of recording calls is exactly what I currently do, every day.

I’m addressing the statement asserted that “every telephone conversation in the US is recorded for posterity by the NSA”

I do not claim it is impossible. I do mean to show that it is highly improbable given the logistics required and level of technology currently available.

So let's break this down into the challenges faced by a large organization if they came to me today and asked for a solution. It's really simple math. I will look at the total big picture here as a combined solution for this hypothetical NSA problem.

After reading the thread and all of the technical responses, some of them some really deep rabbit holes calling out technologies that have no bearing on a high scale recording solution such as this, no one seems to be looking at the actual technical limiting factor in this problem.
Regardless of your transport layer (I really don't care if you’re doing G7.11, G7.29 VoIP, SIP or some analog transport layer, or any of the other variants) , on/off hook state, yada yada yada, when audio is captured it has to be converted into usable form through an audio codec. This is where the bottleneck occurs. I used to get wrapped up in all that too, when I first made the transition from straight telephony applications to recording telephony applications.

Basic rule of thumb for audio recording codecs is about 500 Kb per minute of audio storage for uncompressed audio, slightly less for lossless compression or 100 Kb per minute of storage for lossy compression - PER CALL.
(If you’re trying to do the math, you can stop reading here, multiply those numbers above by the total number of calls generated on any given day in the US alone, conservatively 9 Billion, times the average call length of 3 minutes, and come up with a storage requirement based on whether you intend to compress or not. Roughly 13 Petabytes PER DAY, without compression, 3 Petabytes per day, compressed).

There are a lot of different codecs out there, but they all have the same basic math.

With or without compression is important for a couple of really big reasons.
1. Compression and converting into the correct codec isn't magic. It requires compute power. The more you compress, the more compute power is required. If you skimp on CPU resources here, the audio will eventually get compressed, but your systems will get log jammed and never catch up, meaning that by the time we get a recording stored for analysis or playback, it could be days, weeks or years later, assuming the resources responsible for it didn't crash in the meantime. Therefor CPU Cores have to be dedicated to this task.
2. Transcription and other advanced analytics require a clean uncompressed audio recording to produce reliable results. You can certainly run transcription against a compressed audio file, but the accuracy drops off significantly, making the results unreliable.

Next, let's look at that transcription piece. This requires more compute power than most people have seen in their entire careers. I’m really not exaggerating. Today, most transcription companies are employing advanced GPU processors in addition to CPU cores. Even then, it's a ridiculously expensive part of the solution. I’ve shown these requirements to customers with really deep pockets and seen them physically react, like they were punched in the gut. Again, it's not about IF it can be done, but HOW FAST it can be done to make the results usable.

There seemed to be a lot of effort in the other posts to discuss how various government agencies would host all of these compute and storage resources. This is actually the easier part, as the feds and most of their component agencies have embraced cloud computing and storage for some time. Amazon and Azure have dedicated resources specifically set up for government agencies, with additional security and segregation as needed. That said, it still costs money and isn't unlimited. “The Cloud” is just a cool name for “Other Peoples Hardware”.

The bigger problem here is the sheer volume of the result data provided and what you want to do with it after its been captured and analyzed.
You've already invested a disgusting amount of time and money into capturing all 9 billion calls per day, the compute resources to convert that into a usable audio format, the storage to put it all somewhere, more compute resources to transcribe and analyze it, and now you’re going to throw even more database and compute power at it to pull out relevant data or “flag” it for follow up. So lets now assume that you had all of the financial and technical resources to do all that (being the federal government, that's not too hard to imagine, though for scale, it would be on par with the Apollo Program in terms of both human and technical resources required).

Where do you propose to get the human resources to review all of this now filtered data? We aren't going to let machines prosecute targets. (I’m sure there’s some conspiracy theorist out there who truly believes this is already being done)
So let's do some exploratory math.

Let's say that one tenth of one percent (0.001) of the captured conversations in the US have some verbiage in them that requires additional follow-up by the NSA or other alphabet agency. Not an unreasonable number actually. The solution you just proposed has the ability to capture and filter out that small percentage of calls for further review and it has done that.
Many of these filtered conversations will be as simple as little Timmy calling Aunt Jenny and talking about the cool science experiment he’s working on for the Science Fair.
That would mean that you have Nine Million calls to review. With a human analyst. Today. Repeat tomorrow. You already threw technology at it to get the number down that low, so now it's a human problem.
The average Quality Assurance Analyst at a commercial or government agency currently reviews about 5–10 calls a day. QA Analysts have some of the highest burnout rates in the industry, as just sitting there listening to recordings all day for something relevant or interesting will drive most people insane.
So you’ll need to hire Nine Hundred Thousand Human Analysts to review just the recordings that got flagged as “potentially interesting”. And keep an ongoing recruitment effort up to replace half that number every year.

Good luck.

Most of my really large customers start out with expectations that they’ll record every call. Then they start understanding what that really means. Some continue it, out of legal necessity, but use a filtering process on the front end to immediately dump the majority of calls so they don't have a long term storage problem. Others eventually come up with some kind of logic to decide which calls should be recorded in the first place and why. SO they may only record calls made to a specific number for example.
These are the challenges faced by large enterprise commercial organizations recording only those calls coming into their environment. This is usually an exercise where I calculate the solution requirement for millions of calls, annually.
Recording every call made in the US is a ridiculously complex problem, with very limited benefit, and I believe that even the NSA would question the logic of doing it.

Capturing metadata on the other hand - totally reasonable and probable.

=============

https://www.quora.com/profile/Paul-Losleben

Sorry, but there is a lot of misinformation floating around about the UDC. Here is my analysis done back in 2013. I think the numbers are still relevant within an order of magnitude or so:

“The Internet backbone traffic in 2011 was between 3.4x10^18 and 4.0x10^18 bytes per year. Note that this does not include telephone traffic except that done over VOIP. Let's consider only digital transmission of telephone conversations and let's say that we use only the lower part of the spectrum, say <5Khz, sample at twice that frequency or 10Khz, use 16bit encoding (2bytes) and digital compression of about 2X. That yields 10^4 bytes of data for every second of every telephone call. According to the BLS, the average American spends 0.75 hours per day on the telephone. Let's say 2.5x10^8 people old enough to be using the telephone. That gives 2.5x10^8 x 0.75 x 60 x 60 x 10^4 = 6.75x10^15 bytes of telephone conversation data every day or 2.5x10^18 bytes of data per year.

“Between just the Internet and phone conversations, we have nearly 10^19 bytes of data per year that the NSA would have to store. And, this only includes telephone traffic in the USA. And, this doesn't include all the other types of data being moved around the world.”

Even it were possible to build a disk farm capable of holding that much data, it would have cost more than NSA’s entire budget at the time. In the past 6 years, the problem has become a lot bigger.

No, NSA is not recording everything.

Yes, NSA has been allowed to do this since at least 1978, as outlined in the Foreign Intelligence Surveillance Act of 1978 which allowed them to do this for communication to or between foreign persons. It was broadened to include domestic persons in 2001 in the Patriot Act.

The question raised here regarded the belief that the NSA was recording all phone calls, both foreign and domestic. I contend that this is not technically feasible. Here is my analysis of this done in 2013 in another forum:

————————————

The Internet backbone traffic in 2011 was between 3.4x10^18 and 4.0x10^18 bytes per year. Note that this does not include telephone traffic except that done over SKYPE [and similar applications]. Let's consider only digital transmission of telephone conversations and let's say that we use only the lower part of the spectrum, say <5Khz, sample at twice that frequency or 10Khz, use 16bit encoding (2bytes) and digital compression of about 2X. That yields 10^4 bytes of data for every second of every telephone call. According to the BLS, the average American spends 0.75 hours per day on the telephone. Let's say 2.5x10^8 people old enough to be using the telephone. That gives 2.5x10^8 x 0.75 x 60 x 60 x 10^4 = 6.75x10^15 bytes of telephone conversation data every day or 2.5x10^18 bytes of data per year.

Between just the Internet and phone conversations, we have nearly 10^19 bytes of data per year that the NSA would have to store. And, this only includes telephone traffic in the USA. And, this doesn't include all the other types of data being moved around the world.

Not bloody likely!

———————————————

The problem becomes more difficult with the increasing use of packet switched networks and yes, there have been technical advances since 2013. Still, it is not economically feasible to store all telephone conversations. It is, however, possible to store meta data about who is talking to whom and using this information to do network analysis allows selective recording of future conversations under warrant as specified by the Patriot Act.

=============

Search This Blog

[Bread and Circuses - Politics and other Silliness]

Is Big Brother Really Recording all Phone Calls?

Comments

Post a Comment

Popular posts from this blog

Vox - Donald Trump’s long history of racism, from the 1970s to 2020

Voter Fraud?

Food Deserts