Mesos-based Data Infrastructure @ DoubanZhong Bo Tian
How to build an elastic and efficient platform to support various Big Data and Machine Learning tasks is a challenge for a lot of corporations. In this presentation, Zhongbo Tian will give an overview of the Mesos-based core infrastructure of Douban, and demonstrate how to integrate the platform with state-of-art Big Data/ML technologies.
Greenplum is leading MPP database technology for OLAP and ad-hoc workload. With more than 10 years R&D, Greenplum now become a bigdata platform, using it, you could do OLAP, Mixed workload, advanced analytics, machine learning, Text analysis, GIS/Geospatial analysis, Grapth analysis over various dataset no matter it is managed by Greenplum, Hadoop, S3, Gemfire, Database etc.
Hadoop con 2015 hadoop enables enterprise data lakeJames Chen
Mobile Internet, Social Media 以及 Smart Device 的發展促成資訊的大爆炸,伴隨產生大量的非結構化及半結構化的資料,不但資料的格式多樣,產生的速度極快,對企業的資訊架構帶來了前所未有的挑戰,面對多樣的資料結構及多樣的分析工具,我們應該採用什麼樣的架構互相整合,才能有效的管理資料生命週期,提取資料價值,Hadoop 生態系統,無疑的在這個大架構裡,將扮演最基礎的資料平台的角色,實現企業的 Data Lake。
Mesos-based Data Infrastructure @ DoubanZhong Bo Tian
How to build an elastic and efficient platform to support various Big Data and Machine Learning tasks is a challenge for a lot of corporations. In this presentation, Zhongbo Tian will give an overview of the Mesos-based core infrastructure of Douban, and demonstrate how to integrate the platform with state-of-art Big Data/ML technologies.
Greenplum is leading MPP database technology for OLAP and ad-hoc workload. With more than 10 years R&D, Greenplum now become a bigdata platform, using it, you could do OLAP, Mixed workload, advanced analytics, machine learning, Text analysis, GIS/Geospatial analysis, Grapth analysis over various dataset no matter it is managed by Greenplum, Hadoop, S3, Gemfire, Database etc.
Hadoop con 2015 hadoop enables enterprise data lakeJames Chen
Mobile Internet, Social Media 以及 Smart Device 的發展促成資訊的大爆炸,伴隨產生大量的非結構化及半結構化的資料,不但資料的格式多樣,產生的速度極快,對企業的資訊架構帶來了前所未有的挑戰,面對多樣的資料結構及多樣的分析工具,我們應該採用什麼樣的架構互相整合,才能有效的管理資料生命週期,提取資料價值,Hadoop 生態系統,無疑的在這個大架構裡,將扮演最基礎的資料平台的角色,實現企業的 Data Lake。
Big Data Beyond Hadoop*: Research Directions for the FutureOdinot Stanislas
Michael Wrinn
Research Program Director, University Research Office,
Intel Corporation
Jason Dai
Engineering Director and Principal Engineer,
Intel Corporation
云计算在过去的几年里成为一个非常流行名词,对于大多数人来讲,云计算依旧非常陌生,然而云计算其实已经早已走出实验室,作为成熟的产品出现在我们的面前。
我们将会详细的介绍云计算的概念,从IAAS(Infrastructure as a service)设施即服务、PAAS(Platform as a service)平台即服务、SAAS(Software as a serice)软件即服务三个层面上介绍相应的技术和产品,介绍云计算时代带给我们程序员的改变,在这样一个时代,我们程序员不再只是一个工程师,同样也是一个艺术家,可以创造出更多、更炫的产品。
As part of London Design Festival, we collaborated with Karim Samuels, a local graffiti expert, to explore how the culture, environments and systems that graffiti writers and street artists work within affect what they make. Our exhibit, which included the six posters here, showed some key insights into how context impacts the artists and writers across 5 cities spanning the US, Latin America, Europe and Asia.
We hope you enjoy them. Do get in touch at contact@pdd.co.uk if you have any questions or feedback, we’d love to hear your thoughts.
From http://www.csdn.net/article/2015-12-17/2826501
《新加坡管理大学信息系统学院教授朱飞达 :大数据与金融创新:从研究到实战》
新加坡管理大学信息系统学院教授朱飞达分享了基于社交媒体大数据的个人征信应用模式,包括四个方面:提取社交维度特征,加入现在传统信用模型;采用产生式模式挖掘不同信用类别的隐含用户模型;基于社会关系网络的风险传递查询和探索引擎;实时反欺诈侦测和预警系统。
A presentation on Hadoop for scientific researchers given at Universitat Rovira i Virgili in Catalonia, Spain in October 2010. http://etseq.urv.cat/seminaris/seminars/3/
Big Data World Forum (BDWF http://www.bigdatawf.com/) is specially designed for data-driven decision makers, managers, and data practitioners, who are shaping the future of the big data.
Similar to Hadoop development in China Mobile Research Institute (20)
Re-Think of Virtualization and ContainerizationXu Wang
The Hyper view on Container and Virtual Machine --- the similar parts and the different parts. The sildes was presented in Open Source Operating System Annual Technical Conference 2015 at Tsinghua University, Beijing on Nov 28th.
4. 中国移动大云技术架构
CMCC IT Supporting Systems Internet App IDC 。。。
Application
• IT Supporting System of CMCC
• IDC and Internet Applications
Cloud Storage Data Mining Search Engine
System Management :CloudMaster
System Management :CloudMaster
BC-NAS BC-PDM BC-SE Enabler
• BC-PDM: Cloud base Data Mining
• BC-NAS: File and Object Storage with
Sturcture Data Storage
web Interface and REST API
HugeTable
• BC-SE: Search Engine
CloudSecurity
CloudSecurity
Hadoop MapReduce
with CMRI Extension Platform
• MapReduce & HDFS: based on Hadoop
Object Storage Distributed Filesystem and with some extensions by CMRI
based on
oNest
Hadoop HDFS :
• HugeTable:Structure Storage with
SQL interface
Elastic Computing: BC-EC :
•oNest:Object Storage for Web Apps.
Linux, Xen/KVM
• CloudMaster: System Management
Resource
• PC Server and SATA Disk based
• BC-EC: IaaS based on OpenNebula
• Based on FOSS: Linux, KVM, Xen
5. 大云与Hadoop
CMCC IT Supporting Systems Internet App IDC 。。。
Development based on Hadoop
Parallel ETL and Data Mining
Cloud Storage Data Mining Search Engine based on MapReduce
System Management :
System Management :CloudMaster
BC-NAS BC-PDM BC-SE Search Engine based on
MapReduce
HugeTable (Structure data
Sturcture Data Storage
HugeTable
storage for data warehouse)
based on Hive, HBase & MR
CloudSecurity
CloudSecurity
Hadoop MapReduce
with CMRI Extension Development extending Hadoop
Object Storage Distributed Filesystem Volume Management of
based on DataNode in HDFS
oNest
Hadoop HDFS
NameNode Cluster for HDFS
Elastic Computing: BC-EC Multi-queue scheduler with
Linux, Xen/KVM queue priority enhancement
External facilities for Hadoop
Test tools for Hadoop HDFS
Inside job performance
evaluation tool
MapReduce Job Submission
Web Interface
6. Development on Hadoop in CMRI
Contributing to Mainline
Online Volume Management of DataNode (by Wang Xu etc., HDFS-
1362)
Off-Tree and Opened
NameNode Cluster for HA (by Wang Xu, hosted in GitHub)
Off-Tree and not Maintained
hdfs-fuse (by Zhao Peng, hosted in Google Code)
Multi-queue scheduler with queue priority enhancement (by Guo Leitao)
External Facilities
hadoop-test (by Wang Xu, hosted in Google Code)
MapReduce Job Submission Web Interface and Inside job performance
evaluation tool (by Guo Leitao, etc.)
Bug Fixes
7. DataNode Online Volume Management
http://github.com/gnawux/hadoop-cmri
https://issues.apache.org/jira/browse/HDFS-1362
Current State:
Disk failure Node
Decommission
Online Volume
Management:
Online removal of
failed disk
Migrate Data in faild
volume if still
readable
Change Disk online