PDA

View Full Version : Microsoft wants ALL of your data


Darkstar
20-04-2004, 06:27:02
http://aboutai.net/DesktopDefault.aspx?tabindex=1&tabid=2&article=aa112902a.htm

Basically, it's a small piece on the "Your Digital Life Worth's of Info". This isn't something new... there's plenty of companies working on the exact same thing. However, I found it interesting that Microsoft admits that Gordon Bell is recording all his phone conversations. That's illegal to do by non-governmental entities without some serious notification.

I like how they claim that they are going to make it hacker proof. That's usually the least of the worries with such things... but also, an impossible thing. What one man makes, another can break.

Sir Penguin
20-04-2004, 06:31:21
No they don't. If they wanted all our data, every line into each distributed collection point would be clogged, and there's no way they would be able to combine the data into a central repository.

SP

Darkstar
20-04-2004, 06:40:59
They do want all your data.

The key to all this is two things...

stickiness, and getting fair value out of all the dark fiber that got laid in the 90s.

Stickiness yields you (the host) many things. Like being able to look through people's data, and see what data they have, and then offer them products based on that data. It also lets you make money selling all sorts of metrics to companies, as well as make money by selling targeted ads to companies. Plus, once the host has a customer's data, they will be locked in for life. It would take them too long to transfer over to another repository.

Making the dark fibre light up is another money opportunity. Most of the bandwidth available out there still isn't being used... and they are always adding more. Whole companies have gone into bankruptcy over the cost of making that bandwidth. And their creditors want to sell the use of it.

fp@korea
20-04-2004, 06:50:25
Do they want ALL our base, too?

Sir Penguin
20-04-2004, 07:10:44
I read a statistic the other day that 50 million Americans are on broadband connections. If each of them have 20 GB of data, that means that 1 EB of data needs to be transferred. As far as I know, the fattest long public pipe in the US is an OC12 across the width of the country. That's a maximum of 622 Mbps. Even if every house in the US had its own OC12 network branch, and if Microsoft had 1000 data repositories distributed evenly around the network, it would still take more than 163 days to transfer everything to Microsoft's repositories at constant peak bandwidth. It would take 101 days to collect that data in a magical central server with 1000 Gigabit LAN cards, a universal 1 Tb data bus, and 1 EB of storage space. And then they'd have to process the data.

Not to mention server backups, which would take 44 billion of the new Blu-ray 25 GB DVDs (I bet they could get a volume discount).

SP

Sir Penguin
20-04-2004, 07:14:28
Here's the math for that (I screwed up a couple times):

SP

Sir Penguin
20-04-2004, 07:16:52
Sorry, that last line is off by 1/1024. It will actually take 43 million Blu-ray DVDs.

SP

Darkstar
20-04-2004, 07:59:52
You know how few banks of Peda byte drives you'd need?

And the point is that you wouldn't be transfering 20 GB of data per day. Unless you are DVRing all your shows down from your personal station to the repository. And that isn't what would happen... they'd have it act as your DVR.

Storage grows faster then anything else in computers. And centralizing it means you get some serious economy of scale. You would only need a couple of backup centers for roll-over on catastrophic failure.

And the last I heard, it was only... er... I'll have to come back and edit that. My HD Search is still underway for the latest figures...

Darkstar
20-04-2004, 08:01:58
Ok...

The number of high-speed lines connecting U.S. businesses and homes to the Internet jumped 18 percent to 23.5 million lines during the first half of 2003, according to statistics in a report released Monday.


my HD search is still going on, looking for any latest figures... but I don't recall reading that the actual number of broadband had hit 50 Million in the States.

Darkstar
20-04-2004, 08:26:08
Ah. Had to load up the latest backup of my work archives.

According to a study research firm In-Stat/MDR released Tuesday, nearly 27 million U.S. businesses and home users were subscribed to a broadband service at the end of 2003, a 48 percent increase from the previous year.

Not close to 50 Million. Just under 10% of the US population at the moment. And computer/digital media analysis companies are already estimating that the broadband rush is over... That there will be a small trickle of conversion from dial up or never subscribed to broadband services.

Lurker
20-04-2004, 13:02:52
From ]this (http://www.nytimes.com/2004/04/19/technology/19DIAL.html)

The situation is likely to change as more users move to broadband. In 2003, 23 million households had high-speed access, up from 16 million the year before, according to the Yankee Group, a research firm. In 2003, 51 million American households connected to the Internet through a dial-up connection, down from 55 million a year before, the firm reported.

Obviously, 23 million households means a lot more than 23 million people. Probably a lot closer to 50 million people than 27 million.

Plus this from the same article:

If office connections are counted, 55 percent of Americans have high-speed access, according to a study released on Sunday by the Pew Internet and American Life Project, a nonprofit research group.

Sir Penguin
20-04-2004, 20:12:46
You know how few banks of Peda byte drives you'd need?

Uh... 1024 PB drives.

And the point is that you wouldn't be transfering 20 GB of data per day. Unless you are DVRing all your shows down from your personal station to the repository. And that isn't what would happen... they'd have it act as your DVR.
The point is that it would take 163 days to transfer 20 GB of data from each American on the Internet, assuming that you have a complete optimizing restructure of the US network (including the installation or usurpation of 1000 evenly distributed data centres, and somehow connecting every broadband-enabled household to a direct, unshared OC12 connection to the nearest data centre) and an unprecedented 100% throughput sustained over almost half a year. Undoubtedly, the time it would take right now would measure in at least the hundreds of years. And then there's the 60% of American netizens who don't have broadband at home (see quote below)...

Storage grows faster then anything else in computers.
Now, that's a problem. I know of no filesystem that can address 1 EB of data. The largest I know of is 4 TB.

And the last I heard, it was only... er... I'll have to come back and edit that. My HD Search is still underway for the latest figures...
http://www.usatoday.com/tech/webguide/internetlife/2004-04-18-broadband_x.htm:

As of February, 48 million folks, or 39% of Net users, had adopted speedy access at home. That's a 60% climb compared with the 30 million home users in March 2003.
As Lurker pointed out, there's more than 1 person per household, so that's actually 50 million (rounded up from 48) OC12 lines at 1 per person and 20 GB per person, not per house. My bad. :)

SP

Darkstar
21-04-2004, 07:05:41
So you'd need an IBM cube cluster, Penguin. No biggie. And it's Microsoft paying for it.

And you aren't going to accumulate very much data in a day.

Keep an eye out for this. It's the concept behind GMail (and the recently announced competitors to GMail that should be making it public in the next 2 months). All your data in one place.

File systems will be expanded, redesigned, refactored, and recycled.

Sir Penguin
21-04-2004, 07:33:09
I'm not sure you're clear on the scale here...

Taking the average speed per user down to 512 Kbps (still higher than reality when you include the dialup users), it would take 557 years to transfer all the data to 1000 distributed collection points (which would still require millions of dollars to create). That still doesn't solve the problem of fitting that data onto one server. A $1 million cluster isn't going to cut it, it's technologically impossible to access that much data at once, and physically impractical to make any meaningful reports on data subsets.

What the hell would Microsoft do with all that data? Provide a backup service for the whole nation?

SP

Darkstar
21-04-2004, 18:57:50
Why would you need 1 million servers? Distributed computing will eventually scale up to handle all of it, anyways, Artic Knight. That's it's goal. You won't need 1 million servers... the world will be 3 big grids... Microsoft, Linux, and Other. And they will interact as their creators and maintainers allow.

Plus, bandwidth is going to continue to grow. In the next 10 years, the average bandwidth available is being projected as growing by a factor of 10 BILLION at a minimum, via various technologies just beginning commercialization at this time.

15 years ago, what we have now is considered "Impossible" due to our scale. And 40 years ago, 15 years ago was considered "impossible" on that scale. Maybe we won't have the capabilitities in 5 or 10 years, but it is probably going to happen before your future kids get into high school, Penguin. Ubiquitous computing with ubiquitous storage.

Additionally, you are going to see basic "compression"... after all, you wouldn't be the only one to "record" the broadcast of the Friends, "Junior" Finale. All those digital "tv broadcast" recordings that started at 8:00 PM for your "local geo-physical" area will all end up being one file, pointed at and "unknowingly shared" by all of you that recorded it. When you branch it off (adding in your own notes, comments, and graphical highlights and drawings), that will be branched off. Most of that compare and "reduce" will take place by minor intelligent agents. Just hope they don't bug out and delete everything for the last 10 days when they get "upgraded". But if they do, your data will only be gone for a short time... as backups will be required just to cover the normal outages in a distributed grid.

The scale is insignificant, Penguin. It's going to happen. The only question is how many security chips its going to require you have implanted into you, so all your wireless equipment can access your personal data, and what will people do that get "off-grid" are going to do. It's going to result in a significant shift in our culture.

Sir Penguin
21-04-2004, 19:46:41
Well, I think that the idea that average bandwidth available to a person at home will grow by 10 billion times in 10 years is absolute, unbelievable bullshit. The move from dialup to cable boosted the average bandwidth (for just those who upgraded, this doesn't include the 60% of Americans still on dialup) around 10-15 times. Do you seriously think that regular people will have Petabit connections to the Internet within 10 years?

SP

Darkstar
21-04-2004, 22:51:20
Once everything is already in the storage, SP, you won't need that much bandwidth in the first place, will you?

Realistically, I don't know. Depends on how things work out. Could be we only have 100,000x available bandwidth. Could be only 200x. Part of that drive will be what companies think the consumer will want, "tomorrow". And there's not that much out there digitally that really needs a huge pipe. But regardless, that won't be a constant need...

And what are we calling regular people? Microsoft techies? Mid-Western traditional family farmers? South African Diamond Miners? Siberian Nomads? Beijeng Beauracry Workers? Some people are definately going to have a mind staggeringly huge pipe compared to todays rates. And if it takes longer then 10 years, it's still going to happen in the next 30 or 50 years. Unless the world ends between now and then.

protein
21-04-2004, 23:04:19
geek forum!

zmama
21-04-2004, 23:09:58
Why, yes it is!

We missed you Protein, welcome back into the geek dungeon....ummm I mean forum.

Sir Penguin
22-04-2004, 00:04:09
No, you won't need the bandwidth once everything's in your magical storage facility, but it's got to get in storage somehow. :)

By regular people I meant the average, but there's no way even the high-end power user will have a few gigabits to their home, let alone a Pb. Right now, it would take me less than 2 hours to download a complete 4.7 GB DVD image from a good mirror at 64% of my bandwidth capacity (I often hit that limit), and that's with only a 10 Mbps connection.

SP

Sir Penguin
22-04-2004, 00:06:40
What consumers want today is for movies and games to be released for cheap on Bittorrent. The companies don't seem to be responding to that desire.

SP

Deacon
22-04-2004, 02:20:42
I see broadband rollout as the biggest obstacle to lighting up the dark fiber. Even though a recent poll/study concluded that 2 out of 5 internet users are on broadband in the US, that means that 3 out of 5 are not using broadband. The only options seem to be wireless, DSL, satellite, and cable. I've heard about the potential of power lines for broadband internet transmissions, but nothing concrete has appeared AFAIK.

On indexing, it'll take a powerful AI to "read" documents, "listen" to audio, etc. and index it all so that a user can use natual language queries. Obviously, they'll want this to be done on the client side if there are going to be backup servers in the picture. The "answer files" can be regurgitated for the clients to translate queries into hierarchical file names.

I think it's doable eventually. It's less of a problem than trying to figure out how to make computers actually understand what they're doing.

The bandwith issue, like the AI issue will require advances. New compression schemes can shrink multimedia. Possibly we'll see PCs that de-emphasize the CPU and use dedicated circuitry to work with key codecs. And then, if changes are incremental, only the diffs need to be transmitted.

All this could still take a while. :)

Darkstar
22-04-2004, 02:34:50
Originally posted by Sir Penguin
What consumers want today is for movies and games to be released for cheap on Bittorrent. The companies don't seem to be responding to that desire.

SP

But the students and BitTorrent users are. ;)

Sir Penguin
22-04-2004, 02:38:30
They don't release for cheap exactly. :)

SP

Darkstar
22-04-2004, 02:47:56
I'm glad to see Deacon sees it's going to happen.

For start up adults, it could be a while to get everything they want uploaded. But not for people born after the system goes live. Their "personal" record will start with their sonagrams, etc. And have pointers to their parents and relatives video/stills portals or individual "tapings" (like Mom screaming at Dad who is video taping the labor ;)). It's you, Pengy, that's got the problem of getting your 12,000 DVD movies into the your personal drop point. ;) And scanning in all your school work and doodles that you want to keep, etc.

The new technologies that get deployed will service large areas at once, so it's not like they are string yet more cable to your house. And they are already deploying some of those technologies in extremely rural areas around the world, as it's easier and cheaper... and results in between x3 to x100 current MAX cable modem rates (depending on the particular tech). Runs cheaper, massive access for pennies on the year, to the end user.

Of course, once someone has ALL your data in their system (and the problem with this layout is that every employer you ever work for will want you to only use their data store, and never, ever, access your own (to prevent you copying out their data to your data store)), they can then charge you outrageously. What else are you going to do? You are going to want access to the photos and video and recordings of your parents, siblings, grand-parents, pets, and special events. This is a prime reason why Oracle wants in on the life-store deal, as well as 3Com, IBM, Microsoft, Google, Yahoo, and a host of other companies, actually.

Power line access is being tested in certain markets in the US already, Deacon. I recall reading that it was being test marketted by three different power utility companies. Arizona, Colorado, and California spring to mind, but I'd need to see if I could dig up the actual mentions in my work archive.

Sir Penguin
22-04-2004, 02:52:03
If what you're talking about actually happens, it will happen so far in the future that planning for it is as worthwhile as planning for an alien invasion.

SP

Sir Penguin
22-04-2004, 02:55:11
SP

Darkstar
22-04-2004, 03:02:08
Originally posted by Sir Penguin
If what you're talking about actually happens, it will happen so far in the future that planning for it is as worthwhile as planning for an alien invasion.

You know, Reagan planned for Alien Invaders.

Many professional security spooks around the world still plan for alien invasions. As do many simply odd private citizens.

I think you are going to hear about a serious revolution in broadband access... and if Google sticks to this GMail, you will see most of the majors diving in and offering 10, 20, and eventually 100G for your private data store. They'll hand out new browsers with all the latest features, and those browsers will store EVERYTHING you ever access in that datastore. If you were a SF "tapeworm" (someone that recorded every conversation and everything you ever read), all that would fit into one 5 GB drive the most talkative politician or long posting physisist giving discertations and debates on his latest Master Grand Unified High Theory of Everything. Most men wouldn't be able to fill 2 Gig, most women wouldn't fill 3 Gig, with living to be 120 years old.

Of course, you want to keep a record of all the Daily Shows in HD, you are going to go past what you need for pure "words for life" need. ;)

Sir Penguin
22-04-2004, 04:13:57
Actually, 5 GB holds just 60 days of 7 kbps sound. Recording for just 1 hour per day, 5 GB would hold 5 real-time years' worth of voice.

SP

No longer Trippin
22-04-2004, 05:02:44
I love how you two get along, it makes for some amusing reading.

SP, actually if switched to a modern protocol and left TCP/IP behind. we could easily be pulling WAY more, so the physical infrastructure doesn't have to change as much for DS figure. Still a good bit, but not nearly as much. I'll see if I can dig up a link on the protocol.

Deacon
22-04-2004, 05:30:55
Most of us don't talk all the time. And if a computer has speech recognition, the text can be stored and played back with a voice synthesizer. An "early" system might switch the mic on when some threshold is passed, storing only the text. A later system might have enough storage to leave the mic open all the time and store all the audio. :)

Sir Penguin
22-04-2004, 05:35:58
I suspect they'll switch to a modern protocol stack soon after Intel stops selling x86-derived CPUs. :)

SP

Sir Penguin
22-04-2004, 05:44:30
Storing the speech as text is a good idea. At 150 words/minute, 5 GB would hold 168 years' worth of 8-bit text at 1 hour per day.

SP

Deacon
22-04-2004, 06:00:49
x86 will probably be extended for a little while longer. I imagine that it'll always be around either in compatibility mode or emulation. Itanium was supposed to eventually make it to desktops, but Intel's hopes for the Itanium have proven to be too optimistic.

No longer Trippin
22-04-2004, 07:08:26
I can see x86 coming down in ten years as it is incredibly hard to work around it's shortcomings. All those extra transistors to work around the problems translate into heat. I can't see a change from x86 coming to be until there are multicore processors though as people are going to expect no performance hit with the change so emulation is out for at least the first generation or two, afterwards the speed gains would probably outweigh it enough to run it with emulation.