|
Parallelization & Smart Objects |
|
|
|
|
Written by martcon
|
|
Monday, 17 May 2010 13:32 |
|
In our previous blog, we discussed Horizontal Scaling and how scaling across multiple computer servers is a key feature of Cloud Computing and has potential benefits for smart objects and smart networks. Another concept which goes hand in hand with horizontal scaling is parallelization. With the advent of Cloud Computing, the scale and implementation of the concepts of parallelization have changed. Parallelization can increase the speed of software operations or increase response time. Simultaneously, Vertical Scaling can be used on symmetric multiprocessors to spawn multiple program threads.
However, as the Sun Microsystems White Paper on Cloud Computing Architecture points out, vertical scaling only has as much parallel processing capability as the server has processors (or cores) - or, at least, as many cores that have been purchased and allocated to a particular Virtual Machine (VM). This is because today's computing environments are shifting towards x86-architecture servers with two or four programming sockets (i.e. the interfaces which make network programming possible.). It is for this reason that parallelization should be considered on a more macro scale than our previous description as software that can use parallelization across many servers can scale to potentially thousands of servers. This infinitely increases the potential for scalability than was possible with symmetric multiprocessing.
In the traditional physical world of computing, parallelization has been frequently implemented using load balancers or content switches that distribute incoming requests from software programs across a number of servers. Similarly, parallelization in a cloud computing world can be implemented with a load balancing application or a content switch but distributing incoming requests across a number of virtual machines in this situation. In both scenarios, applications can be designed to recruit additional resources to accommodate workload spikes.
The classic example of parallelization with load balancing is a number of stateless web servers (i.e. a server that treats each request as an independent transaction that is unrelated to any other request) where the incoming workload is distributed across a pool of servers. Of course, there are many other ways to use parallelization in Cloud Computing environments. For example, a Cloud Computing application that uses a significant amount of CPU time to process user data might use a scheduler to receive jobs from users. The scheduler then places the data into a repository and starts a new VM for each job and hands the VM a token that allows it to retrieve the data from the repository. When the VM has completed its task it passes a token back to the scheduler that allows it to pass the completed project back to the user and then terminates.
Applications can be parallelized only to the extent that their data can be partitioned so that independent systems can operate on it in parallel. Any credible application architecture should include a plan for dividing and conquering data. The partitioning of data has a significant impact on the volume of data transferred over networks. There are several examples of parallelization that leverage data partitioning. We have previously discussed Hadoop (http://hadoop.apache.org). As noted previously, this is an implementation of the MapReduce design pattern which is itself an implementation of the master/workers parallelization design pattern. Database sharding, which we discussed previously, can be accomplished through a range of partitioning techniques including vertical partitioning (i.e. partitioning by database table column), range-based partitioning (e.g. by date) and directory-based partitions (i.e. partitioning by distrinct domains). The approach taken really depends on how the data is to be used.
Parallelization is also being used in the finance industry. Major financial institutions have refactored their fraud detection algorithms so that what was once more a batch data-mining operation where patterns and trends were detected from large data sets now runs on a large number of systems in parallel and provides real-time analysis of incoming data. Some High Performance Computing (HPC) applications that deal with three-dimensional data have been designed so that the state of one cubic volume of a gas, liquid or solid can be calculated for time t by one process. This means that the state of the one cube is passed onto the parallel processes representing eight adjoining cubes and the state is calculated for time t+1.
The argument for the use of parallelization is therefore clear. The data management of smart objects and smart networks would also benefit from the adoption of a parallelization strategy as the volume of data and the conversion of that data into meaningful information may necessitate the use of parallelization techniques. The myriad of devices and the lack of standardization in packet formats and data transmission may lead to many different types of data packet listeners and data capture and interpretation software being needed. Consider the example of a system that captures data from a wireless sensor network (WSN) and a smart grid. The smart grid may transfer data to the system using 2.5G or 3G telecommunications while the WSN may transfer data using Zigbee. The packets would be in different formats, would contain different data and would require different software to capture and translate the packets. When one factors in the different Operating Systems (TinyOS, Contiki or indeed none in many cases) and Programming Languages (nesC, C++, Java among others) used, it is clear that bespoke software would be required for the different smart objects, be they sensors, smart meters, GPS readers or RFID tags. These data capture modules would ideally run in parallel so that data could be captured from these devices simultaneously thus providing a richer snapshot of the condition and activities taking place within the environment or infrastructure being monitored.
Partitioning strategies could also play a key role in conjunction with parallelization for the data management of smart objects. Smart networks (or smart dust) could comprise of tens of thousands of computing devices. By adopting a mechanism by which data could be organised and partitioned by group, location or by date captured, data could be distributed horizontally across the Cloud. Similarly, partitioning could be undertaken on a vertical basis where database table columns could be split logically.
Like the other aspects of Cloud Computing that we have discussed in previous blogs, parallelization is another technique that is helping to make Cloud Computing an enabling technology for the data management of smart objects. Vertoda provides data management and middleware that can be used in the Cloud to organize and store smart object data. We are also developing a platform that will greatly enhance the ability to capture data from the myriad of smart objects and manage this data both in Cloud and Enterprise Computing environments. |
|
|
Horizontal Scaling & Smart Objects |
|
|
|
|
Written by martcon
|
|
Thursday, 13 May 2010 11:27 |
|
Traditionally, software architects and developers would have expected their applications to run on a powerful server. Recently, however, the trend towards horizontal scaling has been increasing. Rather than expecting applications to run on highly scalable servers, developers havebeen redesigning (or refactoring) Information Systems and software applications so that they can scale horiizontally across a number of computer servers. This refactoring of applications is not a trivial task as both the applications and the data captured, managed and stored by these applications must be designed so that both processing and data can be broken down into smaller chunks. It is this existing architectural trend that has been a key factor propelling the adoption of cloud computing.
There are examples of horizontal scaling in High Performance Computing (HPC), Database Management Systems, CPU-intensive processing and Data-intensive processing. Horizontal scaling had been used for HPC workloads long before the advent of cloud computing in a Grid Computing framework. Developers have refactored applications to achieve the distribution of HPC workloads across bare-metal compute grids. HPC has been used in many scientific applications. For example, scientists have broken down data for applications such as 3-D climate modelling so that it can be spread across a large number of servers. Grid computing is a predecessor to cloud computing as it uses tools to provision and manage multiple racks of physical servers so that they can all work together to solve a problem. As HPC is extremely demanding in terms of compute power, interprocess communication and input-output (I/O), such workloads would be most suitable for Clouds that provide Infrastructure As A Service (IaaS). Access to bare-metal servers or Type I Virtual Machines (VM) that provide more direct access to I/O devices would be specific examples.
As a sidebar we will define the terms bare-metal and Type I VMs. Bare-metal refers to the underlying physical architecture of a computer or server. Running an Operating System on bare-metal refers to running an unmodified version of the OS on the physical hardware. A Type I or Native VM refers to the scenario where the software layer that provides the virtualization for the VM runs on the bare hardware. Given that many HPC applications leverage the hardware directly for purposes of speed, it is clear that Bare-metal servers or Type I VMs would be suited for such applications.
Database Management Systems can also be adapted to run in cloud computing environments. Database servers can be horizontally scaled and database tables can be partitioned across the servers. This technique is known as sharding and allows multiple instances of database software, be it Oracle, MySQL, SQL Server or any other type of database, to scale performance in a cloud computing environment. Rather than accessing a single, central database, applications now access one of the many database instances depending on which shard contains the requested data.
CPU intensive applications are also good candidates for horizontal scaling. Applications that perform intensive tasks such as frame rendering (the process of transforming logical objects such as points, lines etc. into physical representations) can create a separate VM to render each frame rather than creating a new programming thread, thus enchancing performance. Horizontal scaling is also suitable for data-intensive processing as large amounts of data can be processed and the results coalesced to a coordinating process. For example, Hadoop (http://hadoop.apache.org/), which we discussed in a previous blog, is an open source implementation of the MapReduce framework for processing huge datasets using multiple computers.
The question we will ask in this blog is the role Horizontal Scaling can play for Smart Objects and Smart Networks. As we have illustrated in previous blogs, smart objects provide a rich new pool of real-time or near real-time data that will require processing. This data will need to be processed and stored. Frequently, the data will be need to be converted to meaningful information. For example, applications processing Wireless Sensor Network (WSN) data may have to apply complex mathematical formulae to convert measurements from engineering units to a more meaningful metric. When multiple instances of such data is arriving to a system every second it is clear that such applications may be both CPU and data intensive and would benefit from a horizontal scaling strategy. Similar logic also applies for storing smart object data. It may be difficult to predict the data storage requirements for smart networks at the outset. Rather than running the risk of a single database server becoming full, the data can be categorised and distributed across the cloud.
The use of data from smart networks for intensive tasks such as data mining, prediction formulation and pattern detection/analysis would certainly be CPU and data intensive tasks that would also be candidates for a HPC application. HPC could also be used for simulating and testing smart networks. One example is the use of HPC by the US Army Redstone Technical Test Center to test Ad-Hoc Wireless Sensor Networks. Given the role of smart objects in environmental monitoring and scientific applications - for example, biosensors to detect the presence of chemicals or other agents or monitor human and animal health - one would expect the use of HPC to process smart object data to grow in the coming years. |
|
Cloud Computing Infrastructure Models & Smart Objects |
|
|
|
|
Written by martcon
|
|
Thursday, 06 May 2010 09:18 |
|
The Cloud Computing Infrastructure Models that are commonly defined are public, private and hybrid clouds. Each of these infrastructure models entail trade offs. It should be noted that the terms public, private and hybrid do not dictate location. Public Clouds are typically available over the Internet and private clouds are usually located at the premises of an organisation. However, private clouds could reside at a co-location facility as well and accessed over the Internet.
There are a number of considerations for organisations with regard to the cloud computing model they select. Large organisations with comprehensive and complex IT needs may need to avail of more than one model to solve different problems. For example, an application that is only needed on a temporary basis might be best suited for deployment in a public cloud as this avoids the need to purchase additional equipment that will only be used temporarily. Similarly, it might be best to deploy an application that will be used permanently by the organisation in a private or hybrid cloud. This also applies where there are particular requirements regarding quality of service or the location of data.
Public Clouds are run by third parties and applications from different customers are likely to be mixed together on the cloud's computer servers, storage systems and networks. It is typical for public clouds to be hosted away from customer premises. The key value of public clouds is that they provide a way to reduce customer risk and cost by providing a temporary extension to enterprise infrastructure. As stated in Sun Microsystem's white paper on Cloud Infrastructure, the existence of other applications running in a public cloud should be transparent to both cloud architects and users if a public cloud is implemented with performance, security and data locality in mind.
One of the key benefits of public clouds is that they can be much larger than most organisations' private clouds could ever be. Public clouds offer the ability to scale up and down on demand while the risks associated with deploying and maintaining IT infrastructure is shifted from the enterprise to the cloud provider.
Portions of a public cloud can be carved out for the exclusive use of a single client, thus creating a virtual private data centre. Rather than being limited to deploying virtual machine (VM) images (i.e. a software implementation of a computer that executes programs like a physical computer), a virtual private data centre gives customers greater access and control over the IT infrastructure they're using. Customers can manipulate not just VM images but also servers, storage systems, network devices and network topology. Creating a virtual private data centre with all components located in the same facility lessens the issue of data locality and overcomes potential bandwidth bottlenecks as all resources are in the same physical location.
Private Clouds are built for the exclusive use of one client and provide that client with a high level of control over data, security and quality of service. The customer owns the infrastructure and has control over how applications are deployed on it. Private clouds may be deployed at the organisation's premises in, for example, an enterprise data centre or may be deployed at a co-location facility. An organisation's own IT department can build and manage their own private cloud or a cloud provider can perform this service. Sun Microsystems refer to this latter service as a 'hosted private' model where IT infrastructure is installed, configured and operated to support a private cloud for the organisation. The key value of the 'hosted private' model is the high level of control it gives organisations over the use of cloud resources while bringing in the expertise required to establish and operate the environment.
The final type of cloud infrastructure model is the hybrid cloud. This combines both the public and private cloud models and provides scalability from a third party cloud provider as and when an organisation requires it. The ability to extend a private cloud with the resources of a public cloud can be used to maintain service levels when workloads and data processing.storage requirements increase. Sun Microsystems cite the use of storage clouds (i.e. storing data over the cloud) to support Web 2.0 applications as a common example of the use of hybrid clouds.
A hybrid cloud can also be used to cope with planned increases in workload. This is sometimes referred to as 'surge computing' where a public cloud can be used to perform periodic task that can be deployed easily on a public cloud. However, it should be noted that hybrid clouds do complicate cloud architecture for an organisation as they introduce the challenge of deciding how to distribute applications across both a public and private cloud. The relationship between data and processing resources must be considered. If large amounts of data must be loaded into a public cloud for a small amount of processing one should question the efficacy of using the hybrid cloud model. As a general rule of thumb, a hybrid cloud will be more effective if the amount of data being transferred is small or the application is stateless (i.e. doesn't have to maintain setting and configuration data).
The question we will now consider is what implications the different Cloud Infrastructure Models have for smart objects. The volume of data produced by smart objects will greatly expand the data processing undertaken by organisations in coming years. Will smart objects and smart networks be considered just another service offered by cloud computing? Probably not, as smart objects and networks will be deployed locally. By their nature, smart ecosystems provide computing functionality and devices for infrastructure such as buildings, pipelines and natural environments. While these systems may be deployed remotely from an organisation's premise this is not cloud computing per se. However, the different cloud computing infrastructure models do offer different benefits to these smart ecosystems.
As noted, public clouds offer computational facilities and IT infrastructure that would otherwise be unavailable to the typical enterprise. Given the requirements to capture and interpret data from potentially millions of smart objects, the data processing required would be ideally suited for a public cloud. This data processing could include translating the data into a meaningful measurement, interpreting the data, performing pattern analysis, data mining and generating Business Intelligence. Such computations are intensive and will often require the computing power offered by public clouds.
However, one should not disregard private clouds for smart networks. In the case of private clouds located on premises, the transfer of data from smart networks (such as a smart building) deployed locally will require much less bandwidth than transferring this data to a public cloud. Often, the smart network will be part of the overall network used by the organisation for its private cloud. Private clouds provided remotely will also have potential benefits for smart networks as the use of dedicated resources for processing and storing smart network data provides organisations with a finer level of control for managing the data provided by smart ecosystems.
Hybrid clouds and the affiliated strategy of surge computing may also be appropriate for smart networks in certain cases. The ad-hoc nature of smart networks implemented using wireless sensor networks, Bluetooth, Zigbee and other technologies means that demand for data processing and workload requirements can vary greatly. In such a circumstance, the ability to expand a private cloud with public cloud resources could be beneficial. However, the caveats regarding data transfer that were noted earlier should be taken into account.
There is no definitive answer as to the type of Cloud Computing Infrastructure Model that should be used for smart objects. It depends on the location of the smart network and the type of data processing being carried out. For smart meters, for example, the data is remotely captured at the utility customer's home or premises so one could argue that it is a moot point whether data is transferred to a (local) private or public cloud as the cost of data transfer may roughly be the same. This may apply to remote wireless sensor networks as well. However, there are other considerations regarding security and privacy that may mandate the use of a private cloud. In essence, organisations should consider the benefits and shortcomings of each Cloud Infrastructure Model for their smart networks and factor in legal and business as well as technical requirements. |
|
|
Smart Objects & Data Physics |
|
|
|
|
Written by martcon
|
|
Friday, 30 April 2010 11:54 |
|
Sun Microsystems have defined the term Data Physics as the consideration of the relationship between the processing elements of an Information System and the data on which these processing elements operate.
The 'Clouds' in Cloud Computing can be divided into storage clouds for data storage and compute clouds which carry out the processing. Storage clouds complement compute clouds. Since most compute clouds store data in the cloud rather than on a physical computer server it takes time to bring data to a server to be processed.
Sun Microsystems have created a simple equation for data physics. This equation describes how long it takes to move an amount of data from where it is generated, stored, processed and archived. As Sun point out, while Clouds are good at storing data they are not necessarily good at archiving and destroying data on a predefined schedule. In simple terms, large amounts of data or low bandwidth network connections lengthen the time it takes to move data. This can be expressed mathematically as time = (bytes*8) / bandwidth.
In practical terms, this equation is relevant for both the moment-by-moment processing of data and for long term planning. For example, if the IT infrastructure needs increase for an organisation, that organisation can expand its cloud by temporarily 'renting' resources from a Cloud Provider. This strategy is known as surge computing. However, as this process entails moving data from one cloud to another it can ultimately entail taking more time than if expansion of the cloud did not take place. The data physics equation helps determine whether it makes sense to implement a surge computing strategy where it might take longer to move the data to a public cloud provided by a cloud vendor than it would to process the data within the current IT framework. Data physics can also help determine the cost of moving operations from one cloud provider to another. Whatever data has accumulated in one cloud provider's data centre must be move to another data centre. Such a process takes time.
The cost of moving data can be expressed both in terms of time and in terms of bandwidth charges. Data physics is a reminder to consider the relationship between data and processing and that moving data from storage to processing can cost both time and money. Data stored without computing power nearby has limited value.
Data physics has implications for smart objects and smart networks. The volume of data produced by smart networks will increase exponentially in the coming years and this data will need to be stored by organisations. Given the volume of devices and networks producing this data storage, clouds would appear to be a logical choice for many organisations as the cost of the IT infrastructure otherwise required would be prohibitive. One must factor in, however, that the data produced by smart networks needs to be processed and transformed into meaningful information. Essentially, this means that data that is received by a storage cloud from a smart network will then need to be transferred to a compute cloud. Sun's data physics equation can be used to compute the cost of processing this data within the cloud.
The processing of translating smart object data into meaningful information for an organisation can entail the use of data mining, business analytics and business intelligence techniques. Furthermore, this information will need to be loaded into other software and Information Systems such as Web Content Management Systems, Document Management Systems and Enterprise Resource Planning (ERP) Systems. It is the requirements of the latter two systems that is relevant here as there is currently debate as to whether ERP systems in particular should be maintained on premises or deployed on the cloud. If an organisation's smart object data resides on the cloud while their Enterprise-wide systems are installed on premises this means that data will have to be transferred from the smart network back to the enterprise.
The final issue to consider here relates to data capture. When data is captured it will need to be transferred to the cloud using an Internet or telecommunications connection. While the data may be sent to the storage cloud, it is also possible that the data will need to be transferred from a compute cloud to which the data has been sent to the storage cloud. When architecting an system for capturing and processing smart object data the data physics equation is therefore likely to be very valuable in decision making.
There are two pools of resources to consider here. Smart objects offer a rich pool of data that can be transformed into information for decision making while Cloud Computing offers a rich pool of resources that can enable organisations to process and store huge volumes of data that would otherwise be infeasible. Cloud Computing can act as a complementary enabler for smart object networks but architects and developers need to consider the cost of data transfer when designing a smart object system that is deployed on the cloud.
|
|
Application Architecture White Papers |
|
|
|
|
Written by martcon
|
|
Wednesday, 21 April 2010 09:26 |
|
There are several white papers available on Application Architecture. Sun have a good white paper on Cloud Computing Architectures at http://www.sun.com/featured-articles/CloudComputing.pdf. Service Oriented Architecture (SOA) is described in a white paper by Versata at http://www.versata.com/documents/wp-SOA20041015-p.pdf while the '4-in-1' view of Application Architecture was published in a paper in 1995. See http://www.cs.ubc.ca/~gregor/teaching/papers/4+1view-architecture.pdf. |
|
|
|
|
<< Start < Prev 1 2 3 4 5 6 7 8 9 10 Next > End >>
|
|
Page 2 of 19 |
|