PetaPG

1.
Dispatching Petabytes with PostgreSQL Andrew Pantyukhin andrew@dreamindustries.ru

2.
15M media objects 3PB raw data storage, streaming, processing

3.
HDFS? Isilon? customsolution

4.
1000s hard drives file system per drive filename = sha256(file)

5.
dispatching ingestion, rebalancing encoding, analysis

6.
PostgreSQL! (ofcourse)

7.
entities sha (asset), hdd,chassis metadata, actions, status

8.
15M master objects 25M derivatives 70M copies

9.
200GB core 500GB XMLprocessing 2TB+ overall

10.
custom types enum native/wrappers

11.
hashtypes shatypes + crc32, bugfixes

12.
actions fully async, fail-over dumb polling

13.
smart locking update sett=now()where t old update returning

14.
XML third-partymetadata stored, processed in PG

15.
research large-scale action logging

16.
production aggregated views ofdispatcher

17.
distributed logic dispatcher, XMLprocessing, production, research full-mesh data exchange

18.
table data transfer slow or inflexible simple custom scripts, diff

19.
dream industries disruptive innovationlab funding, collaborating inviting, hiring

More Related Content