The 2009 Federal Stimulus Package devotes $20 billion dollars to electron medical records and another $7.2 billion to broadband Internet access. The spending is unprecedented not only on scale, but also on breath of technology. There is $17.6 billion allocated for incentive payments to health-care providers who adopt electronic health records and $2 billion for coordinating information technology.
This financing is intended to move health care to a new level of quality as well as cost containment.
The healthcare industry exists to heal the human body when it becomes ill. But it needs assistance to overcome inefficient, manual, paper-based processes by automating methods utilizing collaboration, electronic forms, and enhanced patient care.
Automate information capture through electronic registration, health histories, consent forms, and disclosure forms could streamline the entry process. A nurse or administrator can verify health data extract data to electronic medical record (EMR) systems. Then trigger other processes such as billing or scheduling automatically. As a patient's medical record evolves over time, content can be consolidated into a single secure file.
An Electronic Medical Record (EMR) is an electronic record of health-related information on an individual created by authorized staff within one health care organization.
An Electronic Health Record (HER) is an electronic record of health-related information on an individual that conforms to nationally recognized interoperability standards that is managed by authorized staff across more than one health care organization.
A Personal Health Record (PHR) is an electronic record of health-related information on an individual that conforms to nationally recognized interoperability standards and that can be drawn from multiple sources.
The goal is to leverage industry standards and integrate them with enterprise documentation systems through EMR, HER and PHR.
Of the more than 800,000 clinicians in the US, only 17% have EHRs today. This leaves 664,000 who need EHRs. Over the next 5 years the early adopters will gain the full stimulus incentive amounts available in 2011-2012.
There are over 100 companies developing EHRs such as the market leaders are eClinicalWorks, Allscripts, NextGen, GE Centricity, and Meditech/LSS.
One of the most cost effect information technology upgrades for addressing the voluminous data intensive mega-file healthcare records is through parallel processing. We can expect to see tremendous innovation in this area.
Sunday, May 3, 2009
Sunday, March 29, 2009
Primer on PLINQ
In this presentation, we present a primer on LINQ and then we extended its reach to the parallel version Parallel Language Integrated Query (or Parallel LINQ). Parallel LINQ (PLINQ) provides Declarative data parallelism through the ParallelEnumerable and ParallelQuery Classes. We concentrate on the ParallelQuery Class which has two methods (AsParallel, AsSequential).
Language Integrated Query (LINQ) is a special kind of search that locates data from various data sources. LINQ shortens queries while simplifying connecting to a variety of data source. It is all about searching efficiently and consistently with less effort. LINQ searches an array as one searches a Structured Query Language (SQL) server. It divides queries into four common types: LINQ-to-Object and LINQ-to-XML (which we will find also support PLINQ), as well as, LINQ-to-Dataset and LINQ-to-SQL.
The NET Framework LINQ namespaces create a different kind of data connection. The System.Linq namespace contains all the basic classes for LINQ and the System.Linq.Expressions namespace contains the classes, interfaces, and enumerations used to create expressions.
The three Stages in a Query Operation are
1/. Get the data source. If the source is an array you must declare the array and assign values.
2/. Define the query expression
3/. Execute the query to return the results.
For example, a LINQ query that retrieves data from an array would show these three stages as:
int[] nums = new int[] { 0, 4, 2, 6, 3, 8, 3, 0, 4, 2, 1 };
var result = from n in nums
where n < 5
orderby n
select n;
foreach (int i in result)
The standard LINQ operators consist of a collection of 50 methods that define extension methods on the static Enumerable and Queryable classes from the System.Linq namespace. The operators fall into one of two categories: for deferred execution of a query where the query will not execute until you consume the results, and Other operators will execute a query immediately.
LINQ uses keywords for making a query. They tell LINQ what to search for, starting with defining the from and in keywords. The where, orderby, join, and let provide additional conditions. A LINQ query requires four lines. First a variable that holds the query. An Enumerator object to select individual query values. The var keyword identifies the query variable. For example:
var MyQuery =
from StringValue
in QueryString
select StringValue;
Parallel LINQ (PLINQ) forms declarative data parallelism. Parallel Language Integrated Query (PLINQ) uses System.Linq namespace in the System.Threading.dll assembly. LINQ's declarative nature provides the flexibility for a clever implementation of PLINQ to use parallelization. PLINQ extends LINQ developers to use multiple cores for their LINQ expressions by running any LINQ-to-objects query using data parallelism. PLINQ fully supports all .NET query operators and the existing LINQ model.
PLINQ uses two classes:
The ParallelEnumerable method exposed through System.Linq.ParallelEnumerable Class
The AsParallel method exposed through System.Linq.ParallelQuery Class
PLINQ is a query execution engine that accepts any LINQ-to-Objects or LINQ-to-XML query and automatically utilizes multiple processors or cores for execution when they are available. The change in programming model is tiny, meaning you don't need to be a concurrency guru to use it.
With PLINQ you don't need to move your entire database server processing logic over to in-memory LINQ-to-Objects queries in the client. Instead, PLINQ offers an incremental way of using parallelism for existing solutions.
Internally, PLINQ uses Tasks and the parallelizing query gets processed by multiple threads. Preserving the order is an extra step that gives up some of the performance gains.
The following Listing expanding on an example provides an AsParallel methods for finding prime numbers in a loop using PLINQ based upon an example provided by Steven Toub.
Counting Prime Numbers In For Loop
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Data;
using System.Xml;
using System.Diagnostics;
using System.Threading;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
while (true)
{
Console.WriteLine(Time(delegate
{
Queue primes = new Queue();
for (int i = 0; i < 2000000; i++)
{
if (CheckPrime(i)) primes.Enqueue(i);
}
}));
Console.WriteLine(Time(delegate
{
Queue primes = new Queue();
Parallel.For(0, 2000000, i =>
{
if (CheckPrime(i)) primes.Enqueue(i);
});
}));
Console.ReadLine();
}
}
private static bool CheckPrime(int p)
{
if (p < 2) return false;
int upperBound = (int)Math.Sqrt(p);
for (int i = 2; i <= upperBound; i++)
{
if (p % i == 0) return false;
}
return true;
}
static TimeSpan Time(Action a)
{
Stopwatch sw = Stopwatch.StartNew();
a();
return sw.Elapsed;
}
}
}
SUMMARY
In this presentation, we began with a primer on LINQ and then we extended its reach to the parallel version Parallel Language Integrated Query (or Parallel LINQ). Parallel LINQ (PLINQ) provides Declarative data parallelism through the ParallelEnumerable and ParallelQuery Classes. We concentrated on the ParallelQuery Class which has two methods (AsParallel, AsSequential) and provide applications to demonstrate its capabilities.
REFERENCES
[1] Alesso, H. P. and Smith, C. F., Connections: Patterns of Discovery, John Wiley & Sons Inc., New York, NY, 2007.
[2] Alesso, H. P. and Smith, C. F., Thinking on the Web: Berners-lee, Turing and Godel, John Wiley & Sons, Inc. 2008.
[3] Microsoft Visual Studio 2010
Language Integrated Query (LINQ) is a special kind of search that locates data from various data sources. LINQ shortens queries while simplifying connecting to a variety of data source. It is all about searching efficiently and consistently with less effort. LINQ searches an array as one searches a Structured Query Language (SQL) server. It divides queries into four common types: LINQ-to-Object and LINQ-to-XML (which we will find also support PLINQ), as well as, LINQ-to-Dataset and LINQ-to-SQL.
The NET Framework LINQ namespaces create a different kind of data connection. The System.Linq namespace contains all the basic classes for LINQ and the System.Linq.Expressions namespace contains the classes, interfaces, and enumerations used to create expressions.
The three Stages in a Query Operation are
1/. Get the data source. If the source is an array you must declare the array and assign values.
2/. Define the query expression
3/. Execute the query to return the results.
For example, a LINQ query that retrieves data from an array would show these three stages as:
int[] nums = new int[] { 0, 4, 2, 6, 3, 8, 3, 0, 4, 2, 1 };
var result = from n in nums
where n < 5
orderby n
select n;
foreach (int i in result)
The standard LINQ operators consist of a collection of 50 methods that define extension methods on the static Enumerable and Queryable classes from the System.Linq namespace. The operators fall into one of two categories: for deferred execution of a query where the query will not execute until you consume the results, and Other operators will execute a query immediately.
LINQ uses keywords for making a query. They tell LINQ what to search for, starting with defining the from and in keywords. The where, orderby, join, and let provide additional conditions. A LINQ query requires four lines. First a variable that holds the query. An Enumerator object to select individual query values. The var keyword identifies the query variable. For example:
var MyQuery =
from StringValue
in QueryString
select StringValue;
Parallel LINQ (PLINQ) forms declarative data parallelism. Parallel Language Integrated Query (PLINQ) uses System.Linq namespace in the System.Threading.dll assembly. LINQ's declarative nature provides the flexibility for a clever implementation of PLINQ to use parallelization. PLINQ extends LINQ developers to use multiple cores for their LINQ expressions by running any LINQ-to-objects query using data parallelism. PLINQ fully supports all .NET query operators and the existing LINQ model.
PLINQ uses two classes:
The ParallelEnumerable method exposed through System.Linq.ParallelEnumerable Class
The AsParallel method exposed through System.Linq.ParallelQuery Class
PLINQ is a query execution engine that accepts any LINQ-to-Objects or LINQ-to-XML query and automatically utilizes multiple processors or cores for execution when they are available. The change in programming model is tiny, meaning you don't need to be a concurrency guru to use it.
With PLINQ you don't need to move your entire database server processing logic over to in-memory LINQ-to-Objects queries in the client. Instead, PLINQ offers an incremental way of using parallelism for existing solutions.
Internally, PLINQ uses Tasks and the parallelizing query gets processed by multiple threads. Preserving the order is an extra step that gives up some of the performance gains.
The following Listing expanding on an example provides an AsParallel methods for finding prime numbers in a loop using PLINQ based upon an example provided by Steven Toub.
Counting Prime Numbers In For Loop
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Data;
using System.Xml;
using System.Diagnostics;
using System.Threading;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
while (true)
{
Console.WriteLine(Time(delegate
{
Queue
for (int i = 0; i < 2000000; i++)
{
if (CheckPrime(i)) primes.Enqueue(i);
}
}));
Console.WriteLine(Time(delegate
{
Queue
Parallel.For(0, 2000000, i =>
{
if (CheckPrime(i)) primes.Enqueue(i);
});
}));
Console.ReadLine();
}
}
private static bool CheckPrime(int p)
{
if (p < 2) return false;
int upperBound = (int)Math.Sqrt(p);
for (int i = 2; i <= upperBound; i++)
{
if (p % i == 0) return false;
}
return true;
}
static TimeSpan Time(Action a)
{
Stopwatch sw = Stopwatch.StartNew();
a();
return sw.Elapsed;
}
}
}
SUMMARY
In this presentation, we began with a primer on LINQ and then we extended its reach to the parallel version Parallel Language Integrated Query (or Parallel LINQ). Parallel LINQ (PLINQ) provides Declarative data parallelism through the ParallelEnumerable and ParallelQuery Classes. We concentrated on the ParallelQuery Class which has two methods (AsParallel, AsSequential) and provide applications to demonstrate its capabilities.
REFERENCES
[1] Alesso, H. P. and Smith, C. F., Connections: Patterns of Discovery, John Wiley & Sons Inc., New York, NY, 2007.
[2] Alesso, H. P. and Smith, C. F., Thinking on the Web: Berners-lee, Turing and Godel, John Wiley & Sons, Inc. 2008.
[3] Microsoft Visual Studio 2010
Thursday, January 15, 2009
C# Concurrent Programming
Concurrent Programming is being developed to be scalable and produce high performance as manycore hardware systems are now being released for PCs, routers, and small devices. Programming for parallel processing will require optimizing multiple task execution.
Microsoft is planning on releasing concurrent programming tools for windows developers as fast as they can.
The Windows and .NET Framework platforms currently offer threading support for scheduling performance, synchronization AI, and memory hierarchy awareness. Microsoft’s Visual Studio 2010 will add concurrency at the library level for native as well as managed .NET languages.
Traditionally programming languages perform computations using syntax and semantic rules based upon the basic constructs: sequential statements, looping, branching and hierarchical organization. For C# managed code Microsoft has developed a Task Class, a Parallel Class, and a Enumerable Class to take advantage of these constructs for optimizing and simplifying concurrent programming.
Amdahl’s Law applies directly to the optimization of serial and parallel operations performed by software including branching (If, switch, etc.) operations and looping (do, for, and foreach) operations in programs where repetitious operations offer efficient points for concurrent programming methods.
While Microsoft has supported threading operations for many years thread have many hazards associated with simultaneous access of shared memory by different threads. This can lead to hazards that arise from competing threads that jeopardize state management including: deadlocks, livelocks, and race conditions. The next generation of Visual Studio 2010 tools addresses these difficulties.
As a result, Microsoft has developed a Task Program Library to exploits concurrency in program operations without the complexity and error prone direct exposure to threads. In addition, Microsoft’s implementation of these features optimizes core utilization while minimizing complexity.
The following Figure show the Visual Studio 2010 model for building parallel tasks for managed and native code on top of both threads and thread pools.

Figure: Visual Studio 2010 Programming Model
Parallel Extensions for .NET 3.5 provides library-based support for concurrency with any .NET language, such as C++, C# and Visual Basic. It includes: Task Parallel Library, Parallel LINQ, and parallel method extensions.
Visual Studio 2010 and .NET 4.0 provide these extensions in the form of a library for rapid development:
Task Parallel Library (TPL)
Imperative task parallelism – Task Class
Imperative data parallelism – Parallel Class (Parallel.For)
Parallel LINQ (PLINQ)
Declarative data parallelism – ParallelEnumerable and AsParallel Class
The Task Parallel Library (TPL) provides support for imperative data and task parallelism. The Task Parallel Library (TPL) makes it easy to add data and task parallelization to an application.
Parallel LINQ (PLINQ) provides support for declarative data parallelism. PLINQ is an extension of LINQ where the query is run in parallel. PLINQ takes advantage of TPL by taking query iterations and assigning work units to threads (typically processor cores). Adding concurrent programming capabilities to LINQ is a natural extension of LINQ.
Coordination Data Structures (CDS) provide support for work coordination and managing shared state.
The .NET application developers can look forward to the release of Visual Studio 2010 for the tools and techniques to write simple efficient thread-safe code for manycore systems.
[1] Alesso, H. P. and Smith, C. F., Connections: Patterns of Discovery, John Wiley & Sons Inc., New York, NY, 2007.
[2] Alesso, H. P. and Smith, C. F., Thinking on the Web: Berners-lee, Turing and Godel, John Wiley & Sons, Inc. 2008.
[3] Microsoft Visual Studio 2010
Microsoft is planning on releasing concurrent programming tools for windows developers as fast as they can.
The Windows and .NET Framework platforms currently offer threading support for scheduling performance, synchronization AI, and memory hierarchy awareness. Microsoft’s Visual Studio 2010 will add concurrency at the library level for native as well as managed .NET languages.
Traditionally programming languages perform computations using syntax and semantic rules based upon the basic constructs: sequential statements, looping, branching and hierarchical organization. For C# managed code Microsoft has developed a Task Class, a Parallel Class, and a Enumerable Class to take advantage of these constructs for optimizing and simplifying concurrent programming.
Amdahl’s Law applies directly to the optimization of serial and parallel operations performed by software including branching (If, switch, etc.) operations and looping (do, for, and foreach) operations in programs where repetitious operations offer efficient points for concurrent programming methods.
While Microsoft has supported threading operations for many years thread have many hazards associated with simultaneous access of shared memory by different threads. This can lead to hazards that arise from competing threads that jeopardize state management including: deadlocks, livelocks, and race conditions. The next generation of Visual Studio 2010 tools addresses these difficulties.
As a result, Microsoft has developed a Task Program Library to exploits concurrency in program operations without the complexity and error prone direct exposure to threads. In addition, Microsoft’s implementation of these features optimizes core utilization while minimizing complexity.
The following Figure show the Visual Studio 2010 model for building parallel tasks for managed and native code on top of both threads and thread pools.

Figure: Visual Studio 2010 Programming Model
Parallel Extensions for .NET 3.5 provides library-based support for concurrency with any .NET language, such as C++, C# and Visual Basic. It includes: Task Parallel Library, Parallel LINQ, and parallel method extensions.
Visual Studio 2010 and .NET 4.0 provide these extensions in the form of a library for rapid development:
Task Parallel Library (TPL)
Imperative task parallelism – Task Class
Imperative data parallelism – Parallel Class (Parallel.For)
Parallel LINQ (PLINQ)
Declarative data parallelism – ParallelEnumerable and AsParallel Class
The Task Parallel Library (TPL) provides support for imperative data and task parallelism. The Task Parallel Library (TPL) makes it easy to add data and task parallelization to an application.
Parallel LINQ (PLINQ) provides support for declarative data parallelism. PLINQ is an extension of LINQ where the query is run in parallel. PLINQ takes advantage of TPL by taking query iterations and assigning work units to threads (typically processor cores). Adding concurrent programming capabilities to LINQ is a natural extension of LINQ.
Coordination Data Structures (CDS) provide support for work coordination and managing shared state.
The .NET application developers can look forward to the release of Visual Studio 2010 for the tools and techniques to write simple efficient thread-safe code for manycore systems.
[1] Alesso, H. P. and Smith, C. F., Connections: Patterns of Discovery, John Wiley & Sons Inc., New York, NY, 2007.
[2] Alesso, H. P. and Smith, C. F., Thinking on the Web: Berners-lee, Turing and Godel, John Wiley & Sons, Inc. 2008.
[3] Microsoft Visual Studio 2010
Sunday, November 9, 2008
The Era of Amdahl’s Law
Today, the transition is being made from the individual knowledge worker in the Era of Moore’s Law to the collective Web workers in the Era of Amdahl’s Law where Web workers create, sort, search, and manage information between knowledge products, devices, and people.
While corporations were seeking to increase each individual’s productivity in the Era of Moore’s Law, now distributed groups working in parallel are reaping the benefits of plummeting transaction costs over the Web.
In 2005, IBM's announcement that it had doubled the performance of the world's fastest computer, named Blue Gene/L from 136.8 trillion calculations per second (teraflops) to 280.6 teraflops. The Blue Gene system is the new generation of a massively parallel supercomputer in the IBM System Blue Gene Solution series: the epitome of centralized computer power.
At the other end of the scale, Google has developed the largest parallelized computer complex in the world, by inventing their own Googleware technology for parallel processing across distributed servers, microchips, and databases.
As a result, parallel processing is effecting all aspects of the Information Revolution: the mainframe computers have become supercomputers with massively parallel microchip configurations; the individual personal computers of the Era of Moore’s Law have become multicore processors for application processing, and the Web is utilizing applications such as Googleware based upon a vast parallelized computer complex with its specialized concurrent programming.
Parallel processing has infiltrated all aspects of computer usage because the limitations of Moore’s Law require compensation through Amdahl’s Law given by:
Speedup ≤ 1 / (F + (1-F) / N)
Amdahl's law describes how much a program can theoretically be sped up by additional computing resources, based the proportion of parallelizable and serial components. Where F is the fraction of calculation that must be executed serially given as:
F = s / (s + p)
where s = serial execution and p=parallel execution.
Then Amdahl's law says that on a machine with N processors, the maximum speedup is given by:
As N approaches infinity, the maximum speedup converges to 1/F, or (s + p)/s.
This means that a program with fifty percent of the processing executed serially, the sped up is only a factor of two, regardless of how many processors are available. For a program where ten percent must be executed serially a factor of ten is the maximum sped up.
All computer applications must now being translated from sequential programming into parallel processing methods. As a result, the third wave of computing has become the Era of Amdahl’s Law where the information environment of each person is connected through the Web to a variety of multicore devices.
Manycore systems hold the promise of 10 to 100 times the processing power in the next few years. However, as software developer’s transition from writing serial programs to writing parallel programs there will be pitfalls to creating robust and efficient parallel code.
Even if current applications don't have much parallel functionality, s and p can be changed:
1. Increase p by doing more of the same: Increase the volume of data processed by the parts that are parallelizable. This is Gustafson's Law.
2. Increase p by doing adding new features that are parallelizable.
3. Reduce s by pipelining.
If we keep run time constant and focus instead on increasing the problem size, the total work in a fixed time:
Total Work = s + N * p
Besides solving bigger versions of the same problem, we also have the option of adding new features.
Normally, getting just N-fold speedups is considered the Holy Grail, but there are ways to leverage data locality and/or perform speculative and cancelable execution to set up super linear speedups.
References:
[1] Alesso, H. P. and Smith, C. F., Connections: Patterns of Discovery, John Wiley & Sons Inc., New York, NY, 2007.
[2] Sutter, H., Break Amdahl’s Law!, Dr. Dobb’s Portal, Jan. 17, 2008.
[3] Goetz, B., et. al., Java: Concurrency in Practice, Addison-Wesley, Stoughton, Massachusetts, USA, 2008.
While corporations were seeking to increase each individual’s productivity in the Era of Moore’s Law, now distributed groups working in parallel are reaping the benefits of plummeting transaction costs over the Web.
In 2005, IBM's announcement that it had doubled the performance of the world's fastest computer, named Blue Gene/L from 136.8 trillion calculations per second (teraflops) to 280.6 teraflops. The Blue Gene system is the new generation of a massively parallel supercomputer in the IBM System Blue Gene Solution series: the epitome of centralized computer power.
At the other end of the scale, Google has developed the largest parallelized computer complex in the world, by inventing their own Googleware technology for parallel processing across distributed servers, microchips, and databases.
As a result, parallel processing is effecting all aspects of the Information Revolution: the mainframe computers have become supercomputers with massively parallel microchip configurations; the individual personal computers of the Era of Moore’s Law have become multicore processors for application processing, and the Web is utilizing applications such as Googleware based upon a vast parallelized computer complex with its specialized concurrent programming.
Parallel processing has infiltrated all aspects of computer usage because the limitations of Moore’s Law require compensation through Amdahl’s Law given by:
Speedup ≤ 1 / (F + (1-F) / N)
Amdahl's law describes how much a program can theoretically be sped up by additional computing resources, based the proportion of parallelizable and serial components. Where F is the fraction of calculation that must be executed serially given as:
F = s / (s + p)
where s = serial execution and p=parallel execution.
Then Amdahl's law says that on a machine with N processors, the maximum speedup is given by:
As N approaches infinity, the maximum speedup converges to 1/F, or (s + p)/s.
This means that a program with fifty percent of the processing executed serially, the sped up is only a factor of two, regardless of how many processors are available. For a program where ten percent must be executed serially a factor of ten is the maximum sped up.
All computer applications must now being translated from sequential programming into parallel processing methods. As a result, the third wave of computing has become the Era of Amdahl’s Law where the information environment of each person is connected through the Web to a variety of multicore devices.
Manycore systems hold the promise of 10 to 100 times the processing power in the next few years. However, as software developer’s transition from writing serial programs to writing parallel programs there will be pitfalls to creating robust and efficient parallel code.
Even if current applications don't have much parallel functionality, s and p can be changed:
1. Increase p by doing more of the same: Increase the volume of data processed by the parts that are parallelizable. This is Gustafson's Law.
2. Increase p by doing adding new features that are parallelizable.
3. Reduce s by pipelining.
If we keep run time constant and focus instead on increasing the problem size, the total work in a fixed time:
Total Work = s + N * p
Besides solving bigger versions of the same problem, we also have the option of adding new features.
Normally, getting just N-fold speedups is considered the Holy Grail, but there are ways to leverage data locality and/or perform speculative and cancelable execution to set up super linear speedups.
References:
[1] Alesso, H. P. and Smith, C. F., Connections: Patterns of Discovery, John Wiley & Sons Inc., New York, NY, 2007.
[2] Sutter, H., Break Amdahl’s Law!, Dr. Dobb’s Portal, Jan. 17, 2008.
[3] Goetz, B., et. al., Java: Concurrency in Practice, Addison-Wesley, Stoughton, Massachusetts, USA, 2008.
Sunday, August 10, 2008
Concurrent Programming
Mighty oaks from tiny acorns grow or so an ancient proverb claims — consider the microchip. Smaller than a penny, it is the brain of every digital device in the world. Chips connect circuits, computers, handheld devices as well as satellites and an endless list of electronics. As the centerpiece of the information revolution, the chip is the driving force behind innovation as it follows an ambiguous rule called ‘Moore’s Law.’
In 1965, Gordon Moore who shared in the invention of the microprocessor chip and went on to co-found the Intel Corporation wrote an article where he noted that the density of components on semiconductor chips had doubled yearly since 1959. This annual doubling in component density amounted to an exponential growth rate, widely known as Moore's Law.
While Moore’s Law is not a physical law, like gravity, it is an empirical observation that the capacity of memory chips has risen from one thousand bits in 1971 to one million bits in 1991 and to one billion bits by 2001. The billion-bit semiconductor memory chip represented an extraordinary nine orders of magnitude in growth, and a similar growth rate has also been seen in the capability of microprocessor chips to process data.
While many have speculated on the future of Moore’s law, some have concluded that instead of focusing on obtaining greater speed from of a single processor, innovators should develop multi-core processors. Instead of scaling clock speed, which produces power usage and heat emission to unacceptable levels, in order to increase processing power, chip manufacturers have begun adding additional CPUs, or “cores” to the microprocessor package. By working in parallel, the total 'throughput' of the device is increased. Quad cores are already being produced commercially. The advances in parallel hardware development require similar advances in optimizing the execution of multiple tasks working in parallel, called Concurrent Programming.
While on a single core computer can use multithreads to parallelize processes, true processing parallelism doesn't occur without multi-CPU's. Distributed computing uses parallel work units distributed across numerous machines. However, distributed computing incurs additional requirements for task management.
Concurrent programming utilizes task management and communication. The task manager distributes work units to available threads while task communication uses state and memory sharing to establish the initial parameters for a task and collects the result of the task's work. Task communication requires locking mechanisms to insure performance gains, prevent subtle bugs as multiple tasks overwrite memory locations. Synchronization of state and memory issues can be controlled by using locks, monitors, and other techniques to block threads from altering state another makes changes.
Microsoft’s Parallel Computing Development Center provides support for parallelism, programming models, libraries and tools with F#, Task Parallel Library (TP), Parallel Extensions Assembly (PFX), and PLINQ.
F# is a typed functional programming language for the .NET framework that does not directly support concurrent programming. However, it does include asynchronous workflows for I/O. TPL is designed assist in writing managed code for multiple processors. PFX is being folded into TPL. PLINQ is LINQ where the query is run in parallel. PLINQ takes advantage of TPL by taking query iterations and assigning work units to threads (typically processor cores).
Collectively these efforts help the .NET Parallel class to ease the development of threaded applications. Nevertheless, state and shared memory issues are left to the programmer to solve. Adding concurrent programming capabilities to LINQ seems a natural extension of LINQ.
As a result, the next series of technological advances in the information revolution will be strongly dependent on concurrent programming.
Threads
Today work items are run by creating Threads such as:
Thread t = new Thread(DoSomeWorkMethod);
t.Start(someInputValue);
For 10 work items, we could create 10 threads, but this is not ideal because of context switching, and invalidation of each thread’s cache and memory for each thread’s stack. An alternative is to use the .NET ThreadPool class:
ThreadPool.QueueUserWorkItem(
DoSomeWorkMethod, someInputValue);
However, this lacks the richness of the full API since we do not get a reference to it and there is no explicit support to know when it is completed.
Parallel Extensions is a new class similar to Thread with semantics close to ThreadPool. A code snippet for the new Task class is:
Task t = Task.Create(DoSomeWorkMethod,
someInputValue);
See References for further material.
REFERENCES
[1] Alesso, H. P. and Smith, C. F., Connections: Patterns of Discovery, John Wiley & Sons Inc., New York, NY, 2007.
[2] Clifton, M. “Concurrent Programming - A Primer” 3 Jan 2008
[3] Microsoft - Concurrent Programming 2008.
[4] Moth, D., Parallel Extensions to the .NET Framework, 28 February 2008.
In 1965, Gordon Moore who shared in the invention of the microprocessor chip and went on to co-found the Intel Corporation wrote an article where he noted that the density of components on semiconductor chips had doubled yearly since 1959. This annual doubling in component density amounted to an exponential growth rate, widely known as Moore's Law.
While Moore’s Law is not a physical law, like gravity, it is an empirical observation that the capacity of memory chips has risen from one thousand bits in 1971 to one million bits in 1991 and to one billion bits by 2001. The billion-bit semiconductor memory chip represented an extraordinary nine orders of magnitude in growth, and a similar growth rate has also been seen in the capability of microprocessor chips to process data.
While many have speculated on the future of Moore’s law, some have concluded that instead of focusing on obtaining greater speed from of a single processor, innovators should develop multi-core processors. Instead of scaling clock speed, which produces power usage and heat emission to unacceptable levels, in order to increase processing power, chip manufacturers have begun adding additional CPUs, or “cores” to the microprocessor package. By working in parallel, the total 'throughput' of the device is increased. Quad cores are already being produced commercially. The advances in parallel hardware development require similar advances in optimizing the execution of multiple tasks working in parallel, called Concurrent Programming.
While on a single core computer can use multithreads to parallelize processes, true processing parallelism doesn't occur without multi-CPU's. Distributed computing uses parallel work units distributed across numerous machines. However, distributed computing incurs additional requirements for task management.
Concurrent programming utilizes task management and communication. The task manager distributes work units to available threads while task communication uses state and memory sharing to establish the initial parameters for a task and collects the result of the task's work. Task communication requires locking mechanisms to insure performance gains, prevent subtle bugs as multiple tasks overwrite memory locations. Synchronization of state and memory issues can be controlled by using locks, monitors, and other techniques to block threads from altering state another makes changes.
Microsoft’s Parallel Computing Development Center provides support for parallelism, programming models, libraries and tools with F#, Task Parallel Library (TP), Parallel Extensions Assembly (PFX), and PLINQ.
F# is a typed functional programming language for the .NET framework that does not directly support concurrent programming. However, it does include asynchronous workflows for I/O. TPL is designed assist in writing managed code for multiple processors. PFX is being folded into TPL. PLINQ is LINQ where the query is run in parallel. PLINQ takes advantage of TPL by taking query iterations and assigning work units to threads (typically processor cores).
Collectively these efforts help the .NET Parallel class to ease the development of threaded applications. Nevertheless, state and shared memory issues are left to the programmer to solve. Adding concurrent programming capabilities to LINQ seems a natural extension of LINQ.
As a result, the next series of technological advances in the information revolution will be strongly dependent on concurrent programming.
Threads
Today work items are run by creating Threads such as:
Thread t = new Thread(DoSomeWorkMethod);
t.Start(someInputValue);
For 10 work items, we could create 10 threads, but this is not ideal because of context switching, and invalidation of each thread’s cache and memory for each thread’s stack. An alternative is to use the .NET ThreadPool class:
ThreadPool.QueueUserWorkItem(
DoSomeWorkMethod, someInputValue);
However, this lacks the richness of the full API since we do not get a reference to it and there is no explicit support to know when it is completed.
Parallel Extensions is a new class similar to Thread with semantics close to ThreadPool. A code snippet for the new Task class is:
Task t = Task.Create(DoSomeWorkMethod,
someInputValue);
See References for further material.
REFERENCES
[1] Alesso, H. P. and Smith, C. F., Connections: Patterns of Discovery, John Wiley & Sons Inc., New York, NY, 2007.
[2] Clifton, M. “Concurrent Programming - A Primer” 3 Jan 2008
[3] Microsoft - Concurrent Programming 2008.
[4] Moth, D., Parallel Extensions to the .NET Framework, 28 February 2008.
Monday, June 9, 2008
Adding Meaningful Content with Resource Description Framework (RDF)
In 1990, Tim Berners-Lee laid the foundation for the World Wide Web with three basic components: HTTP (HyperText Transfer Protocol ), URLs (Universal Resource Locators), and HTML (HyperText Markup Language). These elements were the essential ingredients leading to the explosive growth of the World Wide Web. The original concept for HTML was a modest one, where browsers could simply view information on Web pages. The HTML program can be written to a simple text file and can be easily mastered by a high school student.
However, HTML wasn't extensible, in that it has specifically designed tags requiring vendor agreement before changes could be made. The eXtensible Markup Language (XML) overcame this limitation. Proposed in late 1996 by the World Wide Web Consortium (W3C), XML offered a way to manipulate a developer's structured data. XML simplified the process of defining and using metadata and provided a good representation of extensible, hierarchical, formatted information.
For the Web to provide meaningful content and capabilities today, it will have to add new layers of markup languages starting with Resource Description Framework(RDF). Figure 1 shows the projected pyramid of Markup Languages for the Web.

Figure 1
Resource Description Framework (RDF) has been developed by the W3C in order to extend XML and make work easier for autonomous agents and automated services by introducing a rudimentary semantic capability.
RDF uses a simple relational model for structured data to be mixed, exported and shared across different applications. Resource Description Framework (RDF) defines a subject, a predicate, and an object to form an RDF triplet.
Consider the statement: The book is entitled, Gone with the Wind.
A simple XML representation might be:
{book}
{title} Gone with the Wind. {/title}
{book}
(Note: we use "{" instead of "<" brackets around tags).
The grammatical sentence has three basic parts: a subject [The book], a predicate [ is entitled], and an object [Gone with the Wind]. A machine, however, could not make an inference based upon the simple XML representation of the sentence.
For machines to make an inference automatically, it is necessary to add RDF to the traditional HTML and XML markup.
The basic RDF model produces a triple, where a resource (the subject) is linked through an arc labeled with a property (the predicate) to a value (the object). Figure 2 shows the graphical representation of the RDF statement.

Figure 2
The RDF statement can be represented as a triple: (subject, predicate, object) and also serialized in XML syntax as:
{?xml version="1.0"?
{rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc="http://purl.org/dc/elements/1.1/"}
{rdf:Description rdf:about=”SUBJECT”}
{dc:PREDICATE}”OBJECT”{/PREDICATE}
{/rdf:Description}
{/rdf:RDF}
A collection of interrelated RDF statements is represented by a graph of interconnected nodes. The nodes are connected via various relationships. For example, let's say each node represents a person. Each person might be related to another person because they are siblings, parents, spouses, or friends.
There are many RDF applications available, for example see Dave Beckett's Resource Description Framework (RDF) Resource Guide.
Many communities have proliferated on the Internet, from companies to professional organizations to social groupings. The Friend of a Friend (FOAF) vocabulary, originated by Dan Brickley and Libby Miller, gives a basic expression for community membership. FOAF expresses personal information and relationships, and is a useful building block for creating information systems that support online communities. Search engines can find people with similar interests through FOAF.
FOAF is simply an RDF vocabulary. Its typical method of use is akin to that of RSS.
For more on RDF and additional markup languages see the references below.
REFERENCES
Connections: Patterns of Discovery
Developing Semantic Web Services
Web Site: Video Software Lab
However, HTML wasn't extensible, in that it has specifically designed tags requiring vendor agreement before changes could be made. The eXtensible Markup Language (XML) overcame this limitation. Proposed in late 1996 by the World Wide Web Consortium (W3C), XML offered a way to manipulate a developer's structured data. XML simplified the process of defining and using metadata and provided a good representation of extensible, hierarchical, formatted information.
For the Web to provide meaningful content and capabilities today, it will have to add new layers of markup languages starting with Resource Description Framework(RDF). Figure 1 shows the projected pyramid of Markup Languages for the Web.

Figure 1
Resource Description Framework (RDF) has been developed by the W3C in order to extend XML and make work easier for autonomous agents and automated services by introducing a rudimentary semantic capability.
RDF uses a simple relational model for structured data to be mixed, exported and shared across different applications. Resource Description Framework (RDF) defines a subject, a predicate, and an object to form an RDF triplet.
Consider the statement: The book is entitled, Gone with the Wind.
A simple XML representation might be:
{book}
{title} Gone with the Wind. {/title}
{book}
(Note: we use "{" instead of "<" brackets around tags).
The grammatical sentence has three basic parts: a subject [The book], a predicate [ is entitled], and an object [Gone with the Wind]. A machine, however, could not make an inference based upon the simple XML representation of the sentence.
For machines to make an inference automatically, it is necessary to add RDF to the traditional HTML and XML markup.
The basic RDF model produces a triple, where a resource (the subject) is linked through an arc labeled with a property (the predicate) to a value (the object). Figure 2 shows the graphical representation of the RDF statement.

Figure 2
The RDF statement can be represented as a triple: (subject, predicate, object) and also serialized in XML syntax as:
{?xml version="1.0"?
{rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc="http://purl.org/dc/elements/1.1/"}
{rdf:Description rdf:about=”SUBJECT”}
{dc:PREDICATE}”OBJECT”{/PREDICATE}
{/rdf:Description}
{/rdf:RDF}
A collection of interrelated RDF statements is represented by a graph of interconnected nodes. The nodes are connected via various relationships. For example, let's say each node represents a person. Each person might be related to another person because they are siblings, parents, spouses, or friends.
There are many RDF applications available, for example see Dave Beckett's Resource Description Framework (RDF) Resource Guide.
Many communities have proliferated on the Internet, from companies to professional organizations to social groupings. The Friend of a Friend (FOAF) vocabulary, originated by Dan Brickley and Libby Miller, gives a basic expression for community membership. FOAF expresses personal information and relationships, and is a useful building block for creating information systems that support online communities. Search engines can find people with similar interests through FOAF.
FOAF is simply an RDF vocabulary. Its typical method of use is akin to that of RSS.
For more on RDF and additional markup languages see the references below.
REFERENCES
Connections: Patterns of Discovery
Developing Semantic Web Services
Web Site: Video Software Lab
Friday, May 16, 2008
Synchronizing Video, Text, and Graphics with SMIL
Today, video is all over the Web. Top networks and media companies now display your favorite shows online; from nail-biting dramas, high-scoring sports, and almost-real reality TV shows to classic feature films.
Apple TV and iTunes stream 720p high-definition (HD) video, and the Hulu.com video Web site has started to add high-definition videos using Adobe Flash Player 9.0 using H.264 encoding.
Synchronized Multimedia Integration Language (SMIL) is the W3C specification standard streaming media language that provides a time-based synchronized environment to stream audio, video, text, images and Flash files. The key to SMIL is its use of blocks of XML (eXtensible Markup Language).
Pronounced "smile," SMIL is an XML compliant markup language that coordinates when and how multimedia files play. Using SMIL, you can
* describe the temporal behavior of the presentation
* describe the layout of the presentation on a screen
* associate hyperlinks with media objects
SMIL players are client applications that receive and display integrated multimedia presentations. SMIL servers are responsible for providing content channels and serving presentations to clients. Although SMIL itself is an open technology, some of the players and servers use proprietary techniques to handle multimedia streaming and encoding.
A SMIL file (extension .smil) can be created with a text editor and be saved as a plain text output file. In its simplest form, a SMIL file lists multiple media clips played in sequence:
smil
body
videosrc="rtsp://yourserver.yourcompany.com/video1.rm"
video src="rtsp:// yourserver.yourcompany.com/video2.rm"
video src="rtsp:// yourserver.yourcompany.com/video3.rm"
body
smil
The master SMIL file is a container for the other media types. It provides the positions for the RealPix graphics files to appear and it starts and stops the video.
The master file is divided into three sections:
* Head: The head element contains information that is not related to the temporal behavior of the presentation. The "head" element may contain any number of "meta" elements and either a "layout" element or a "switch" element. The head contains the meta information, including copyright info, author of the page, and the title.
* Regions: The different regions, which are defined inside the REGION tags control the layout in the RealPlayer window.
* Body: The body of the SMIL file describes the order in which the presentations will appear. The PAR tags mean that the VideoChannel,
PixChannel and TextChannel will be displayed in parallel.
The regions are arranged in a layout similar to the cells in a table. The LEFT and TOP attributes control the position of the different regions along with HEIGHT and WIDTH attributes that specify their size. SMIL has many similarities to HTML, but also some important differences. The SMIL mark-up must start with a smil tag and end with the smil closing tag. All other mark-up appears between these two tags.
A SMIL file can include an optional header section defined by head tags. It requires a body section defined by body tags. Attribute values, must be enclosed in double quotation marks. File names in SMIL must reflect the file name exactly. They can use upper, lower, or mixed case but must be identical with how it appears on the server. SMIL files are saved with the extension .smi or .smil.
The SMIL Sequential seq and Parallel par tags allow you to structure your media. Use the seq tag to play various clips in sequence. In the following, the second video clip begins when the first video clip finishes.
seq
video src="videos/video1.rm"
video src="videos/video2.rm"
seq
To play two or more clips at the same time use the par tag Here the video clip is playing while the text of the lyrics are scrolling in synchroniztion.
par
video src="videos/video1.rm"
textstream src="lyrics/words.rt"
par
When RealServer G2 streams parallel groups, it ensures that the clips stay synchronized. If some video frames don't arrive, RealServer either drops those frames, or halts playback until the frames do arrive. SMIL timing elements let you specify when a clip starts playing and how long it plays. If you do not set timing event, the clips start and stop according to their normal timelines and their positions within par and seq groups. The easiest way to designate a time is with shorthand markers of h, min, s, and ms.
For more information about technology innovations and Web video see the following references.
REFERENCES:
Alesso, H. P. and Smith, C. F., Connections: Patterns of Discovery John Wiley & Sons, Inc. 2008.
Alesso, H. P. and Smith, C. F.,
e-Video: Producing Internet Video as Broadband Technologies Converge (with CD-ROM) Addison-Wesley, 2000.
Web Site:
Video Software Lab
Apple TV and iTunes stream 720p high-definition (HD) video, and the Hulu.com video Web site has started to add high-definition videos using Adobe Flash Player 9.0 using H.264 encoding.
Synchronized Multimedia Integration Language (SMIL) is the W3C specification standard streaming media language that provides a time-based synchronized environment to stream audio, video, text, images and Flash files. The key to SMIL is its use of blocks of XML (eXtensible Markup Language).
Pronounced "smile," SMIL is an XML compliant markup language that coordinates when and how multimedia files play. Using SMIL, you can
* describe the temporal behavior of the presentation
* describe the layout of the presentation on a screen
* associate hyperlinks with media objects
SMIL players are client applications that receive and display integrated multimedia presentations. SMIL servers are responsible for providing content channels and serving presentations to clients. Although SMIL itself is an open technology, some of the players and servers use proprietary techniques to handle multimedia streaming and encoding.
A SMIL file (extension .smil) can be created with a text editor and be saved as a plain text output file. In its simplest form, a SMIL file lists multiple media clips played in sequence:
smil
body
videosrc="rtsp://yourserver.yourcompany.com/video1.rm"
video src="rtsp:// yourserver.yourcompany.com/video2.rm"
video src="rtsp:// yourserver.yourcompany.com/video3.rm"
body
smil
The master SMIL file is a container for the other media types. It provides the positions for the RealPix graphics files to appear and it starts and stops the video.
The master file is divided into three sections:
* Head: The head element contains information that is not related to the temporal behavior of the presentation. The "head" element may contain any number of "meta" elements and either a "layout" element or a "switch" element. The head contains the meta information, including copyright info, author of the page, and the title.
* Regions: The different regions, which are defined inside the REGION tags control the layout in the RealPlayer window.
* Body: The body of the SMIL file describes the order in which the presentations will appear. The PAR tags mean that the VideoChannel,
PixChannel and TextChannel will be displayed in parallel.
The regions are arranged in a layout similar to the cells in a table. The LEFT and TOP attributes control the position of the different regions along with HEIGHT and WIDTH attributes that specify their size. SMIL has many similarities to HTML, but also some important differences. The SMIL mark-up must start with a smil tag and end with the smil closing tag. All other mark-up appears between these two tags.
A SMIL file can include an optional header section defined by head tags. It requires a body section defined by body tags. Attribute values, must be enclosed in double quotation marks. File names in SMIL must reflect the file name exactly. They can use upper, lower, or mixed case but must be identical with how it appears on the server. SMIL files are saved with the extension .smi or .smil.
The SMIL Sequential seq and Parallel par tags allow you to structure your media. Use the seq tag to play various clips in sequence. In the following, the second video clip begins when the first video clip finishes.
seq
video src="videos/video1.rm"
video src="videos/video2.rm"
seq
To play two or more clips at the same time use the par tag Here the video clip is playing while the text of the lyrics are scrolling in synchroniztion.
par
video src="videos/video1.rm"
textstream src="lyrics/words.rt"
par
When RealServer G2 streams parallel groups, it ensures that the clips stay synchronized. If some video frames don't arrive, RealServer either drops those frames, or halts playback until the frames do arrive. SMIL timing elements let you specify when a clip starts playing and how long it plays. If you do not set timing event, the clips start and stop according to their normal timelines and their positions within par and seq groups. The easiest way to designate a time is with shorthand markers of h, min, s, and ms.
For more information about technology innovations and Web video see the following references.
REFERENCES:
Alesso, H. P. and Smith, C. F., Connections: Patterns of Discovery John Wiley & Sons, Inc. 2008.
Alesso, H. P. and Smith, C. F.,
e-Video: Producing Internet Video as Broadband Technologies Converge (with CD-ROM) Addison-Wesley, 2000.
Web Site:
Video Software Lab
Subscribe to:
Posts (Atom)
