Is Big Brother Really Recording all Phone Calls?
- Get link
- X
- Other Apps
From Quora https://www.quora.com/profile/Tony-Ratagick-1
I’ve
read through this entire thread, and the many responses, and am going
to choose this one to chime in with my two cents and side with Paul Losleben and Harry Horse . Before I begin, let me say that I’m not responding to any of the political sides of this debate, only the technical.
Paul Sutton , in several of your responses you bring up your background in the industry, so I’ll briefly summarize my experience as well.
I have over 40 years in the telecommunications industry. I was initially trained by the US Army in long line and satellite communications, have done the same for offshore oil platforms, and been critically involved as a contractor for the Department of Defense and a few of the “alphabet agencies” of the federal government.
I am currently employed in the commercial sector as a pre-sales solution engineer for a company that provides call recording, speech recognition, transcription and voice analytics. My job is to design solutions for organizations that wish to record and analyze telephone conversations, for legal or quality purposes. My customers are mostly the Fortune (insert relevant number here).
Also, for further background, there are three primary vendors in this industry, and a host of smaller ones. I work for one of the top three, and I can attest that we all have the same basic math challenges in recording calls. In other words my competitors are no different than I am.
I have over 40 years in the telecommunications industry. I was initially trained by the US Army in long line and satellite communications, have done the same for offshore oil platforms, and been critically involved as a contractor for the Department of Defense and a few of the “alphabet agencies” of the federal government.
I am currently employed in the commercial sector as a pre-sales solution engineer for a company that provides call recording, speech recognition, transcription and voice analytics. My job is to design solutions for organizations that wish to record and analyze telephone conversations, for legal or quality purposes. My customers are mostly the Fortune (insert relevant number here).
Also, for further background, there are three primary vendors in this industry, and a host of smaller ones. I work for one of the top three, and I can attest that we all have the same basic math challenges in recording calls. In other words my competitors are no different than I am.
In summary, the subject of recording calls is exactly what I currently do, every day.
I’m addressing the statement asserted that “every telephone conversation in the US is recorded for posterity by the NSA”
I
do not claim it is impossible. I do mean to show that it is highly
improbable given the logistics required and level of technology
currently available.
So
let's break this down into the challenges faced by a large organization
if they came to me today and asked for a solution. It's really simple
math. I will look at the total big picture here as a combined solution
for this hypothetical NSA problem.
After
reading the thread and all of the technical responses, some of them
some really deep rabbit holes calling out technologies that have no
bearing on a high scale recording solution such as this, no one seems to
be looking at the actual technical limiting factor in this problem.
Regardless of your transport layer (I really don't care if you’re doing G7.11, G7.29 VoIP, SIP or some analog transport layer, or any of the other variants) , on/off hook state, yada yada yada, when audio is captured it has to be converted into usable form through an audio codec. This is where the bottleneck occurs. I used to get wrapped up in all that too, when I first made the transition from straight telephony applications to recording telephony applications.
Regardless of your transport layer (I really don't care if you’re doing G7.11, G7.29 VoIP, SIP or some analog transport layer, or any of the other variants) , on/off hook state, yada yada yada, when audio is captured it has to be converted into usable form through an audio codec. This is where the bottleneck occurs. I used to get wrapped up in all that too, when I first made the transition from straight telephony applications to recording telephony applications.
Basic
rule of thumb for audio recording codecs is about 500 Kb per minute of
audio storage for uncompressed audio, slightly less for lossless
compression or 100 Kb per minute of storage for lossy compression - PER
CALL.
(If you’re trying to do the math, you can stop reading here, multiply those numbers above by the total number of calls generated on any given day in the US alone, conservatively 9 Billion, times the average call length of 3 minutes, and come up with a storage requirement based on whether you intend to compress or not. Roughly 13 Petabytes PER DAY, without compression, 3 Petabytes per day, compressed).
(If you’re trying to do the math, you can stop reading here, multiply those numbers above by the total number of calls generated on any given day in the US alone, conservatively 9 Billion, times the average call length of 3 minutes, and come up with a storage requirement based on whether you intend to compress or not. Roughly 13 Petabytes PER DAY, without compression, 3 Petabytes per day, compressed).
There are a lot of different codecs out there, but they all have the same basic math.
With or without compression is important for a couple of really big reasons.
1. Compression and converting into the correct codec isn't magic. It requires compute power. The more you compress, the more compute power is required. If you skimp on CPU resources here, the audio will eventually get compressed, but your systems will get log jammed and never catch up, meaning that by the time we get a recording stored for analysis or playback, it could be days, weeks or years later, assuming the resources responsible for it didn't crash in the meantime. Therefor CPU Cores have to be dedicated to this task.
2. Transcription and other advanced analytics require a clean uncompressed audio recording to produce reliable results. You can certainly run transcription against a compressed audio file, but the accuracy drops off significantly, making the results unreliable.
1. Compression and converting into the correct codec isn't magic. It requires compute power. The more you compress, the more compute power is required. If you skimp on CPU resources here, the audio will eventually get compressed, but your systems will get log jammed and never catch up, meaning that by the time we get a recording stored for analysis or playback, it could be days, weeks or years later, assuming the resources responsible for it didn't crash in the meantime. Therefor CPU Cores have to be dedicated to this task.
2. Transcription and other advanced analytics require a clean uncompressed audio recording to produce reliable results. You can certainly run transcription against a compressed audio file, but the accuracy drops off significantly, making the results unreliable.
Next,
let's look at that transcription piece. This requires more compute
power than most people have seen in their entire careers. I’m really not
exaggerating. Today, most transcription companies are employing
advanced GPU processors in addition to CPU cores. Even then, it's a
ridiculously expensive part of the solution. I’ve shown these
requirements to customers with really deep pockets and seen them
physically react, like they were punched in the gut. Again, it's not
about IF it can be done, but HOW FAST it can be done to make the results
usable.
There
seemed to be a lot of effort in the other posts to discuss how various
government agencies would host all of these compute and storage
resources. This is actually the easier part, as the feds and most of
their component agencies have embraced cloud computing and storage for
some time. Amazon and Azure have dedicated resources specifically set up
for government agencies, with additional security and segregation as
needed. That said, it still costs money and isn't unlimited. “The Cloud”
is just a cool name for “Other Peoples Hardware”.
The
bigger problem here is the sheer volume of the result data provided and
what you want to do with it after its been captured and analyzed.
You've already invested a disgusting amount of time and money into capturing all 9 billion calls per day, the compute resources to convert that into a usable audio format, the storage to put it all somewhere, more compute resources to transcribe and analyze it, and now you’re going to throw even more database and compute power at it to pull out relevant data or “flag” it for follow up. So lets now assume that you had all of the financial and technical resources to do all that (being the federal government, that's not too hard to imagine, though for scale, it would be on par with the Apollo Program in terms of both human and technical resources required).
You've already invested a disgusting amount of time and money into capturing all 9 billion calls per day, the compute resources to convert that into a usable audio format, the storage to put it all somewhere, more compute resources to transcribe and analyze it, and now you’re going to throw even more database and compute power at it to pull out relevant data or “flag” it for follow up. So lets now assume that you had all of the financial and technical resources to do all that (being the federal government, that's not too hard to imagine, though for scale, it would be on par with the Apollo Program in terms of both human and technical resources required).
Where
do you propose to get the human resources to review all of this now
filtered data? We aren't going to let machines prosecute targets. (I’m
sure there’s some conspiracy theorist out there who truly believes this
is already being done)
So let's do some exploratory math.
Let's say that one tenth of one percent (0.001) of the captured conversations in the US have some verbiage in them that requires additional follow-up by the NSA or other alphabet agency. Not an unreasonable number actually. The solution you just proposed has the ability to capture and filter out that small percentage of calls for further review and it has done that.
Many of these filtered conversations will be as simple as little Timmy calling Aunt Jenny and talking about the cool science experiment he’s working on for the Science Fair.
That would mean that you have Nine Million calls to review. With a human analyst. Today. Repeat tomorrow. You already threw technology at it to get the number down that low, so now it's a human problem.
The average Quality Assurance Analyst at a commercial or government agency currently reviews about 5–10 calls a day. QA Analysts have some of the highest burnout rates in the industry, as just sitting there listening to recordings all day for something relevant or interesting will drive most people insane.
So you’ll need to hire Nine Hundred Thousand Human Analysts to review just the recordings that got flagged as “potentially interesting”. And keep an ongoing recruitment effort up to replace half that number every year.
So let's do some exploratory math.
Let's say that one tenth of one percent (0.001) of the captured conversations in the US have some verbiage in them that requires additional follow-up by the NSA or other alphabet agency. Not an unreasonable number actually. The solution you just proposed has the ability to capture and filter out that small percentage of calls for further review and it has done that.
Many of these filtered conversations will be as simple as little Timmy calling Aunt Jenny and talking about the cool science experiment he’s working on for the Science Fair.
That would mean that you have Nine Million calls to review. With a human analyst. Today. Repeat tomorrow. You already threw technology at it to get the number down that low, so now it's a human problem.
The average Quality Assurance Analyst at a commercial or government agency currently reviews about 5–10 calls a day. QA Analysts have some of the highest burnout rates in the industry, as just sitting there listening to recordings all day for something relevant or interesting will drive most people insane.
So you’ll need to hire Nine Hundred Thousand Human Analysts to review just the recordings that got flagged as “potentially interesting”. And keep an ongoing recruitment effort up to replace half that number every year.
Good luck.
Most
of my really large customers start out with expectations that they’ll
record every call. Then they start understanding what that really means.
Some continue it, out of legal necessity, but use a filtering process
on the front end to immediately dump the majority of calls so they don't
have a long term storage problem. Others eventually come up with some
kind of logic to decide which calls should be recorded in the first
place and why. SO they may only record calls made to a specific number
for example.
These are the challenges faced by large enterprise commercial organizations recording only those calls coming into their environment. This is usually an exercise where I calculate the solution requirement for millions of calls, annually.
Recording every call made in the US is a ridiculously complex problem, with very limited benefit, and I believe that even the NSA would question the logic of doing it.
These are the challenges faced by large enterprise commercial organizations recording only those calls coming into their environment. This is usually an exercise where I calculate the solution requirement for millions of calls, annually.
Recording every call made in the US is a ridiculously complex problem, with very limited benefit, and I believe that even the NSA would question the logic of doing it.
Capturing metadata on the other hand - totally reasonable and probable.
=============
https://www.quora.com/profile/Paul-Losleben
=============
https://www.quora.com/profile/Paul-Losleben
Sorry,
but there is a lot of misinformation floating around about the UDC.
Here is my analysis done back in 2013. I think the numbers are still
relevant within an order of magnitude or so:
“The
Internet backbone traffic in 2011 was between 3.4x10^18 and 4.0x10^18
bytes per year. Note that this does not include telephone traffic except
that done over VOIP. Let's consider only digital transmission of
telephone conversations and let's say that we use only the lower part of
the spectrum, say <5Khz, sample at twice that frequency or 10Khz,
use 16bit encoding (2bytes) and digital compression of about 2X. That
yields 10^4 bytes of data for every second of every telephone call.
According to the BLS, the average American spends 0.75 hours per day on
the telephone. Let's say 2.5x10^8 people old enough to be using the
telephone. That gives 2.5x10^8 x 0.75 x 60 x 60 x 10^4 = 6.75x10^15
bytes of telephone conversation data every day or 2.5x10^18 bytes of
data per year.
“Between just the Internet and phone conversations, we have nearly 10^19 bytes of data per year that the NSA would have to store. And, this only includes telephone traffic in the USA. And, this doesn't include all the other types of data being moved around the world.”
“Between just the Internet and phone conversations, we have nearly 10^19 bytes of data per year that the NSA would have to store. And, this only includes telephone traffic in the USA. And, this doesn't include all the other types of data being moved around the world.”
Even
it were possible to build a disk farm capable of holding that much
data, it would have cost more than NSA’s entire budget at the time. In
the past 6 years, the problem has become a lot bigger.
No, NSA is not recording everything.
==
Yes,
NSA has been allowed to do this since at least 1978, as outlined in the
Foreign Intelligence Surveillance Act of 1978 which allowed them to do
this for communication to or between foreign persons. It was broadened
to include domestic persons in 2001 in the Patriot Act.
The
question raised here regarded the belief that the NSA was recording all
phone calls, both foreign and domestic. I contend that this is not
technically feasible. Here is my analysis of this done in 2013 in
another forum:
————————————
The
Internet backbone traffic in 2011 was between 3.4x10^18 and 4.0x10^18
bytes per year. Note that this does not include telephone traffic except
that done over SKYPE [and similar applications]. Let's consider only
digital transmission of telephone conversations and let's say that we
use only the lower part of the spectrum, say <5Khz, sample at twice
that frequency or 10Khz, use 16bit encoding (2bytes) and digital
compression of about 2X. That yields 10^4 bytes of data for every second
of every telephone call. According to the BLS, the average American
spends 0.75 hours per day on the telephone. Let's say 2.5x10^8 people
old enough to be using the telephone. That gives 2.5x10^8 x 0.75 x 60 x
60 x 10^4 = 6.75x10^15 bytes of telephone conversation data every day or
2.5x10^18 bytes of data per year.
Between just the Internet and phone conversations, we have nearly 10^19 bytes of data per year that the NSA would have to store. And, this only includes telephone traffic in the USA. And, this doesn't include all the other types of data being moved around the world.
Not bloody likely!
Between just the Internet and phone conversations, we have nearly 10^19 bytes of data per year that the NSA would have to store. And, this only includes telephone traffic in the USA. And, this doesn't include all the other types of data being moved around the world.
Not bloody likely!
———————————————
The
problem becomes more difficult with the increasing use of packet
switched networks and yes, there have been technical advances since
2013. Still, it is not economically feasible to store all telephone
conversations. It is, however, possible to store meta data about who is
talking to whom and using this information to do network analysis allows
selective recording of future conversations under warrant as specified
by the Patriot Act.
=============
- Get link
- X
- Other Apps
Comments
Post a Comment