Hundreds of millions of these, each dayBGP4MP|980099497|A|126.96.36.199|3333|188.8.131.52/16|3333 5378 286 1836|IGP|184.108.40.206|0|0||NAG|| the internet works because of these (and cables and routers and money and people and stuff)
Name Node /some/file /foo/barHDFS client create ﬁle read data Date Node Date Node Date Node write data DISK DISK DISK Node local HDFS client DISK DISK DISK replicate DISK DISK DISK read data
Why ? scalable storage and good for analytics:open source processing schema-less,cost-efﬁcient in one unstructured
Not for me...I don’t have a lot of data.I surely don’t have a cluster of machines to spare.I just read the paper.It’d be cool if I could try this stuff sometime, though...
Step 3: What’s up with Microsoft?Twitter mentions
“Hello, MA POracle” input: text“Google vs. Oracle, 1Microsoft vs. split words Google, 1Apple” Microsoft, 1 emit: Apple, 1“Apache rocks! $WORD, 1 Apache, 1Oracle not so for Oracle, 1much...” ‘interesting’ words Apple, 1“Apple ==iAwesome”
20110807 apache 220110807 google 42220110807 microsoft 4420110807 oracle 1120110808 apache 2520110808 google 134120110808 microsoft 16020110808 oracle 3720110809 apache 1720110809 google 143120110809 microsoft 18420110809 oracle 4020110810 apache 1220110810 google 168820110810 microsoft 17920110810 oracle 51
Step 4: PayFrom: firstname.lastname@example.orgSubject: AWS Billing Statement AvailableGreetings from Amazon Web Services,This e-mail confirms that your latest billingstatement is available on the AWS web site. Youraccount will be charged the following:Total: $218.02Thank you for using Amazon Web Services.Sincerely,Amazon Web Services