A metaphor that came to mind is that we need be building the Weasley's clock, not the Marauder's map.
Update: Jorge points out that Microsoft Researchers in Cambridge have built a prototype.
Technorati Tags: digital rights, ethics, geo
このページは大阪弁化フィルタによって翻訳生成されたんですわ。 |
Web wide crawl with initial seedlist and crawler configuration from March 2011. This uses the new HQ software for distributed crawling by Kenji Nagahashi.
What’s in the data set:
Crawl start date: 09 March, 2011
Crawl end date: 23 December, 2011
Number of captures: 2,713,676,341
Number of unique URLs: 2,273,840,159
Number of hosts: 29,032,069
The seed list for this crawl was a list of Alexa’s top 1 million web sites, retrieved close to the crawl start date. We used Heritrix (3.1.1-SNAPSHOT) crawler software and respected robots.txt directives. The scope of the crawl was not limited except for a few manually excluded sites.
However this was a somewhat experimental crawl for us, as we were using newly minted software to feed URLs to the crawlers, and we know there were some operational issues with it. For example, in many cases we may not have crawled all of the embedded and linked objects in a page since the URLs for these resources were added into queues that quickly grew bigger than the intended size of the crawl (and therefore we never got to them). We also included repeated crawls of some Argentinian government sites, so looking at results by country will be somewhat skewed.
We have made many changes to how we do these wide crawls since this particular example, but we wanted to make the data available “warts and all” for people to experiment with. We have also done some further analysis of the content.
If you would like access to this set of crawl data, please contact us at info at archive dot org and let us know who you are and what you’re hoping to do with it. We may not be able to say “yes” to all requests, since we’re just figuring out whether this is a good idea, but everyone will be considered.
Edifying exquisite equine entrapments
Technorati Tags: digital rights, ethics, geo
The glaring omission in that report is of course podcasts, which have shown huge growth, and have been part of the iPod and iTunes experience for over a year now.On average, the study reports, only 5% of the music on an iPod will be bought from online music stores. The rest will be from CDs the owner of an MP3 player already has or tracks they have downloaded from file-sharing sites.
The report warned against simple characterisations of the music-buying public that divide people into those that pay and those that pirate.
Even though I got $250 of credit on the iTunes Music store from ValleyWag, we have been reluctant to spend it, compared to buying CDs from Amazon. The need to burn your own CDs after purchase (to be sure that the tracks don't vaporize next time you have disk trouble) is a significant extra burden, driven by the DRM. Apple's sync-back from iPods to computers in the new iTunes is a step in the right direction, but a more sensible policy towards failed or deleted downloads is long overdue - failed TV show downloads, and purchases lost through disk failure meet with shrugs from Apple.
From Apple's point of view, the iTunes store is a small part of their business - the bulk of the money passing through it goes straight to the rights-holders or in payment processing or bandwidth costs, while they make far more revenue and profit on the iPods themselves. Overall this is a good thing - if Apple were really beholden to the labels and studios for significant revenue, then online culture would be in worse trouble. As it is, Apple's neutrality means that they are happy to encourage podcasters to show up in their listings, as more media means more iPod sales.
Technorati Tags: audio, iPod, iTunes, podcasting, video
lilo is the executive director of Peer-Directed Projects Center in Houston, Texas & he's another boring cooperativist propertarian Peircean pragmatist anarchist & he runs freenode (http://freenode.net/) & certainly hasn't been getting more sleep lately & is working on freenode-registry in Ruby & blogs on http://spinhome.org/ & http://bloggage.org/ & also uses the nick 'somegeek' & passed away Sep 16th, 2006 (RIP)
Technorati Tags: BBC, culture, Douglas Adams, Hyperland, hyperlinks, Live TV is Dead, movie, video
Technorati Tags: Apple, DRM, HD, PodCamp, podcasting, Rocketboom, Showtime, video
Technorati Tags: Barcamp, BloggerCon, microformats, PodCamp, podcasting, technorati, video
IPTV is interesting not because of streaming, but because of on-demand possibilities a la iPod
IPTV is interesting because of interpretations of packets v. dumb raster display
Technorati Tags: podcasting, streaming, video
This is my personal blog. Any views you read here are mine, and not my employers.
encourage copying, expect payment