Blue Whale Clustered file system
Encyclopedia
Blue Whale Clustered file system (BWFS) is a shared disk file system (also called clustered file system
, shared storage file systems or SAN file system) made by Tianjin Zhongke Blue Whale Information Technologies Company in China
.
BWFS is installed on hosts that are connected to the same disk array
in a storage area network
(SAN) . Client systems are not required to run the same operating system to access a shared filesystem containing StorNext data. As of January 2010, the operating systems with available client software are Microsoft Windows
, Linux
, and Mac OS X
.
BWFS can convert many FibreChannel or iSCSI
disk arrays into a storage cluster that supports multi-server for parallel processing, provide high-performance and extensible file-sharing service, and sustains multi-machine workflow or applications under cluster environment.
BWFS file system is realized in the mode of direct data access. Shared file data directly access to FC or iSCSI disk array through SAN network to transfer data by skipping file server or NAS head, which fully displays the advantage of high bandwidth of SAN environment. BWFS allows great enhancement of system on processing ability for simultaneous file without changing front-end application environment and back-end SAN condition.
BWFS backs the MDC of redundant structure (Meta Data Controller), providing excellent performance and high availability capabilities, combined with SAN infrastructure to bring system reliability and data security for storage at enterprise level.
When multiple servers concurrently access the same file system, certain mechanism is needed to prevent two servers from writing to the same disk location. It should also be ensured that certain server will not read different content in reading file while other server is upgrading this file. In BWFS, such mechanism and function is provided by MetaData Controller.
MDC is responsible for coordinating the access of server to BWFS file system, located outside the read and write path of file data. Client communicates through a separate IP links and MDC to obtain the location of files and resource allocation information of data block. And then, through SAN network, the disk is directly read and written in block-level mode. Such design of architecture is called “out of band transmission frame” or "asymmetric architecture" in technical term:
Data access process can be broken down as follows:
BWFS is designed on the basis of SAN environment, allowing a large number of servers or workstations connecting to FC SAN or IP SAN (iSCSI) to directly access the same file system. BWFS FC can use one or more FC links to access disk resources, so that the IO performance of a single server can be extended to several GB / s from more than 100 MB/s by simply increasing FC HBA card.
Of course, the overall performance of a system is not only relevant to the performance of host and network, but also influenced by the performance of the disk constituting file system. So, BWFS file system can be structured by the LUN from multiple disk arrays. It equals to another layer of RAID structured between multiple disk arrays, which maximizes the performance of disk arrays.
Another factor performance factor should be considered is the location of metadata. A file is consisted by actual data and metadata. Actual data is the content of a file, while metadata includes file attributes, permissions and so on. When a file is created, modified, or deleted, metadata information shall be modified, which means a file is processed by reading both file data and metadata. Usually, large file is read and written continuously, while metadata shall be read by moving magnetic-disc head to other location. For the disk, its read and write mode is much higher than randomness degree. If the data and metadata are memorized in the same disk (mode of the most file systems), the randomness degree of large file will be enhanced accordingly to reduce read and write performance. For this reason, BWFS file system memorizes metadata in different disk or volume in layout, so that the continuous file reading and writing is separated with the randomness of metadata. They are not mutually influenced, so as to provide higher IO bandwidth as much as possible.
In addition, after separation of data and metadata, data and metadata can be processed independently in different hosts without occupying bandwidth of data channel, which can improve the concurrency of data and metadata to further enhance file system performance.
publication said:
BWFS was developed at the National Research Centers for High Performance Computers of the Chinese Academy of Sciences
. In 2007, FalconStor
announced a joint venture to sell the software.
The joint venture was named Tianjin Zhongke Blue Whale Information Technologies Company, located in Tianjin
, China
.
Venture capital
firm VantagePoint Capital also made an investment.
It was announced that BWFS would be used for video from a satellite intended to cover the 2008 Summer Olympics
.
Clustered file system
A clustered file system is a file system which is shared by being simultaneously mounted on multiple servers. There are several approaches to clustering, most of which do not employ a clustered file system...
, shared storage file systems or SAN file system) made by Tianjin Zhongke Blue Whale Information Technologies Company in China
China
Chinese civilization may refer to:* China for more general discussion of the country.* Chinese culture* Greater China, the transnational community of ethnic Chinese.* History of China* Sinosphere, the area historically affected by Chinese culture...
.
Overview
BWFS enables simultaneous file access across heterogeneous platforms and high-performance file creation, storing, and sharing.BWFS is installed on hosts that are connected to the same disk array
Disk array
A disk array is a disk storage system which contains multiple disk drives. It is differentiated from a disk enclosure, in that an array has cache memory and advanced functionality, like RAID and virtualization.Components of a typical disk array include:...
in a storage area network
Storage area network
A storage area network is a dedicated network that provides access to consolidated, block level data storage. SANs are primarily used to make storage devices, such as disk arrays, tape libraries, and optical jukeboxes, accessible to servers so that the devices appear like locally attached devices...
(SAN) . Client systems are not required to run the same operating system to access a shared filesystem containing StorNext data. As of January 2010, the operating systems with available client software are Microsoft Windows
Microsoft Windows
Microsoft Windows is a series of operating systems produced by Microsoft.Microsoft introduced an operating environment named Windows on November 20, 1985 as an add-on to MS-DOS in response to the growing interest in graphical user interfaces . Microsoft Windows came to dominate the world's personal...
, Linux
Linux
Linux is a Unix-like computer operating system assembled under the model of free and open source software development and distribution. The defining component of any Linux system is the Linux kernel, an operating system kernel first released October 5, 1991 by Linus Torvalds...
, and Mac OS X
Mac OS X
Mac OS X is a series of Unix-based operating systems and graphical user interfaces developed, marketed, and sold by Apple Inc. Since 2002, has been included with all new Macintosh computer systems...
.
BWFS can convert many FibreChannel or iSCSI
ISCSI
In computing, iSCSI , is an abbreviation of Internet Small Computer System Interface, an Internet Protocol -based storage networking standard for linking data storage facilities. By carrying SCSI commands over IP networks, iSCSI is used to facilitate data transfers over intranets and to manage...
disk arrays into a storage cluster that supports multi-server for parallel processing, provide high-performance and extensible file-sharing service, and sustains multi-machine workflow or applications under cluster environment.
BWFS file system is realized in the mode of direct data access. Shared file data directly access to FC or iSCSI disk array through SAN network to transfer data by skipping file server or NAS head, which fully displays the advantage of high bandwidth of SAN environment. BWFS allows great enhancement of system on processing ability for simultaneous file without changing front-end application environment and back-end SAN condition.
BWFS backs the MDC of redundant structure (Meta Data Controller), providing excellent performance and high availability capabilities, combined with SAN infrastructure to bring system reliability and data security for storage at enterprise level.
Data access process
BWFS supporting heterogeneous multi-operating system platform, allowing multiple servers to concurrently access the same set of disk and files without concerning the type of their respective file system. Currently, BWFS supports a variety of enterprise-class Linux platform and Windows 2000, Windows XP and Windows 2003. Aiming at different operating systems, BWFS has different client programs, some of which is able to identify and provide the access to BWFS shared file system, and ensure consistent presentation of file system in different operating system. IO requests can be handled properly.When multiple servers concurrently access the same file system, certain mechanism is needed to prevent two servers from writing to the same disk location. It should also be ensured that certain server will not read different content in reading file while other server is upgrading this file. In BWFS, such mechanism and function is provided by MetaData Controller.
MDC is responsible for coordinating the access of server to BWFS file system, located outside the read and write path of file data. Client communicates through a separate IP links and MDC to obtain the location of files and resource allocation information of data block. And then, through SAN network, the disk is directly read and written in block-level mode. Such design of architecture is called “out of band transmission frame” or "asymmetric architecture" in technical term:
Data access process can be broken down as follows:
- Application program issues a write request
- BWFS client sends an operating request to MDC through LAN
- MDC processes this request and responds to the client for which disk blocks can be read in data through LAN.
- BWFS client directly writes data in file system at line speed.
BWFS is designed on the basis of SAN environment, allowing a large number of servers or workstations connecting to FC SAN or IP SAN (iSCSI) to directly access the same file system. BWFS FC can use one or more FC links to access disk resources, so that the IO performance of a single server can be extended to several GB / s from more than 100 MB/s by simply increasing FC HBA card.
Of course, the overall performance of a system is not only relevant to the performance of host and network, but also influenced by the performance of the disk constituting file system. So, BWFS file system can be structured by the LUN from multiple disk arrays. It equals to another layer of RAID structured between multiple disk arrays, which maximizes the performance of disk arrays.
Another factor performance factor should be considered is the location of metadata. A file is consisted by actual data and metadata. Actual data is the content of a file, while metadata includes file attributes, permissions and so on. When a file is created, modified, or deleted, metadata information shall be modified, which means a file is processed by reading both file data and metadata. Usually, large file is read and written continuously, while metadata shall be read by moving magnetic-disc head to other location. For the disk, its read and write mode is much higher than randomness degree. If the data and metadata are memorized in the same disk (mode of the most file systems), the randomness degree of large file will be enhanced accordingly to reduce read and write performance. For this reason, BWFS file system memorizes metadata in different disk or volume in layout, so that the continuous file reading and writing is separated with the randomness of metadata. They are not mutually influenced, so as to provide higher IO bandwidth as much as possible.
In addition, after separation of data and metadata, data and metadata can be processed independently in different hosts without occupying bandwidth of data channel, which can improve the concurrency of data and metadata to further enhance file system performance.
Commercialization
A 2006 GartnerGartner
Gartner, Inc. is an information technology research and advisory firm headquartered in Stamford, Connecticut, United States. It was known as GartnerGroup until 2001....
publication said:
"BWFS, an Internet Protocol (IP) cluster file system (CFS), has moved beyond the research lab and into the commercialization stage, and has now been successfully deployed in various industries including the energy, automotive, military and the media sectors. Its success demonstrates the strengths of China's research institutes in the technology realm, despite their relative lack of commercial experience and investment resources compared to many Western technology providers. Although CFSs are not yet prevalent in the mainstream storage market, for some users who need very high input/output I/O performance — especially leading-edge applications such as oil and gas, biotech and computer-aided design (CAD) — BWFS offers a good price/performance solution. Users should also consider BWFS if looking for a lower-priced CFS. Users that need a more commercialized solution — or that like to have a more “out of box” interface — should consider other vendors such as Panasas, Isilon and Ibrix rather than BWFS."
BWFS was developed at the National Research Centers for High Performance Computers of the Chinese Academy of Sciences
Chinese Academy of Sciences
The Chinese Academy of Sciences , formerly known as Academia Sinica, is the national academy for the natural sciences of the People's Republic of China. It is an institution of the State Council of China. It is headquartered in Beijing, with institutes all over the People's Republic of China...
. In 2007, FalconStor
FalconStor
FalconStor Software is a provider of disk-based data protection software. The Company's software includes Virtual Tape Library with data deduplication, Continuous Data Protector , File-interface Deduplication System and Network Storage Server , each enabled with WAN-optimized replication for...
announced a joint venture to sell the software.
The joint venture was named Tianjin Zhongke Blue Whale Information Technologies Company, located in Tianjin
Tianjin
' is a metropolis in northern China and one of the five national central cities of the People's Republic of China. It is governed as a direct-controlled municipality, one of four such designations, and is, thus, under direct administration of the central government...
, China
China
Chinese civilization may refer to:* China for more general discussion of the country.* Chinese culture* Greater China, the transnational community of ethnic Chinese.* History of China* Sinosphere, the area historically affected by Chinese culture...
.
Venture capital
Venture capital
Venture capital is financial capital provided to early-stage, high-potential, high risk, growth startup companies. The venture capital fund makes money by owning equity in the companies it invests in, which usually have a novel technology or business model in high technology industries, such as...
firm VantagePoint Capital also made an investment.
It was announced that BWFS would be used for video from a satellite intended to cover the 2008 Summer Olympics
2008 Summer Olympics
The 2008 Summer Olympics, officially known as the Games of the XXIX Olympiad, was a major international multi-sport event that took place in Beijing, China, from August 8 to August 24, 2008. A total of 11,028 athletes from 204 National Olympic Committees competed in 28 sports and 302 events...
.
Further reading
- Zhenhan Liu, Xiaoxuan Meng, Lu Xu. Lock management in blue whale file system. In Proceedings of the 2nd International Conference on Interaction Sciences: Information Technology, Culture and Human (ICIS 2009)
- A Storage Slab Allocator for Disk Storage Management in File System[Q],NAS’09,2009
- Lu Xu, Hongyuan Ma, Zhenjun Liu, Huan Zhang, Shuo Feng, Xiaoming Han, "Experiences with Hierarchical Storage Management Support in Blue Whale File System," pdcat, pp. 369–374, 2010 International Conference on Parallel and Distributed Computing, Applications and Technologies, 2010